lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giovanni Fernandez-Kincade <gfernandez-kinc...@capitaliq.com>
Subject RE: Solr Timeouts
Date Tue, 06 Oct 2009 20:49:09 GMT
Yeah that's exactly right Mark.

What does the "maxCommitsToKeep"(from SolrDeletionPolicy in SolrConfig.xml) parameter actually
do? Increasing this value seems to have helped a little, but I'm wary of cranking it without
having a better understanding of what it does.

-----Original Message-----
From: Mark Miller [mailto:markrmiller@gmail.com]
Sent: Tuesday, October 06, 2009 4:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Timeouts

It sounds like he is indexing on a local disk, but reading the files to
be index from NFS - which would be fine.

You can get Lucene indexes to work on NFS (though still not recommended)
, but you need to use a custom IndexDeletionPolicy to keep older commit
points around longer and be sure not to use NIOFSDirectory.

Feak, Todd wrote:
> I seem to recall hearing something about *not* putting a Solr index directory on an NFS
mount. Might want to search on that.
>
> That, of course, doesn't have anything to do with commits showing up unexpectedly in
stack traces, per your original email.
>
> -Todd
>
> -----Original Message-----
> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kincade@capitaliq.com]
> Sent: Tuesday, October 06, 2009 12:39 PM
> To: solr-user@lucene.apache.org; yonik@lucidimagination.com
> Subject: RE: Solr Timeouts
>
> That thread was blocking for an hour while all other threads were idle or blocked.
>
> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
> Sent: Tuesday, October 06, 2009 3:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Timeouts
>
> This specific thread was blocked for an hour?
> If so, I'd echo Lance... this is a local disk right?
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Mon, Oct 5, 2009 at 2:11 PM, Giovanni Fernandez-Kincade
> <gfernandez-kincade@capitaliq.com> wrote:
>
>> I just grabbed another stack trace for a thread that has been similarly blocking
for over an hour. Notice that there is no Commit in this one:
>>
>> http-8080-Processor67 [RUNNABLE] CPU time: 1:02:05
>> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
>> org.apache.lucene.index.SegmentTermEnum.next()
>> org.apache.lucene.index.SegmentTermEnum.scanTo(Term)
>> org.apache.lucene.index.TermInfosReader.get(Term, boolean)
>> org.apache.lucene.index.TermInfosReader.get(Term)
>> org.apache.lucene.index.SegmentTermDocs.seek(Term)
>> org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int)
>> org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos)
>> org.apache.lucene.index.IndexWriter.applyDeletes()
>> org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean)
>> org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean)
>> org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean)
>> org.apache.lucene.index.IndexWriter.updateDocument(Term, Document, Analyzer)
>> org.apache.lucene.index.IndexWriter.updateDocument(Term, Document)
>> org.apache.solr.update.DirectUpdateHandler2.addDoc(AddUpdateCommand)
>> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(AddUpdateCommand)
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(SolrContentHandler,
AddUpdateCommand)
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(SolrContentHandler)
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(SolrQueryRequest,
SolrQueryResponse, ContentStream)
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
SolrQueryResponse)
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest, SolrQueryResponse)
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest,
SolrQueryResponse)
>> org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest, SolrQueryResponse)
>> org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, SolrRequestHandler,
SolrQueryRequest, SolrQueryResponse)
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, ServletResponse,
FilterChain)
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest,
ServletResponse)
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest, ServletResponse)
>> org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response)
>> org.apache.catalina.core.StandardContextValve.invoke(Request, Response)
>> org.apache.catalina.core.StandardHostValve.invoke(Request, Response)
>> org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response)
>> org.apache.catalina.core.StandardEngineValve.invoke(Request, Response)
>> org.apache.catalina.connector.CoyoteAdapter.service(Request, Response)
>> org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream)
>> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection,
Object[])
>> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, TcpConnection, Object[])
>> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[])
>> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
>> java.lang.Thread.run()
>>
>>
>> -----Original Message-----
>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
>> Sent: Monday, October 05, 2009 1:18 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr Timeouts
>>
>> OK... next step is to verify that SolrCell doesn't have a bug that
>> causes it to commit.
>> I'll try and verify today unless someone else beats me to it.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>> On Mon, Oct 5, 2009 at 1:04 PM, Giovanni Fernandez-Kincade
>> <gfernandez-kincade@capitaliq.com> wrote:
>>
>>> I'm fairly certain that all of the indexing jobs are calling SOLR with commit=false.
They all construct the indexing URLs using a CLR function I wrote, which takes in a Commit
parameter, which is always set to false.
>>>
>>> Also, I don't see any calls to commit in the Tomcat logs (whereas normally when
I make a commit call I do).
>>>
>>> This suggests that Solr is doing it automatically, but the extract handler doesn't
seem to be the problem:
>>>  <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"
startup="lazy">
>>>    <lst name="defaults">
>>>      <str name="uprefix">ignored_</str>
>>>      <str name="map.content">fileData</str>
>>>    </lst>
>>>  </requestHandler>
>>>
>>>
>>> There is no external config file specified, and I don't see anything about commits
here.
>>>
>>> I've tried setting up more detailed indexer logging but haven't been able to
get it to work:
>>> <infoStream file="c:\solr\indexer.log">true</infoStream>
>>>
>>> I tried relative and absolute paths, but no dice so far.
>>>
>>> Any other ideas?
>>>
>>> -Gio.
>>>
>>> -----Original Message-----
>>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik Seeley
>>> Sent: Monday, October 05, 2009 12:52 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr Timeouts
>>>
>>>
>>>> This is what one of my SOLR requests look like:
>>>>
>>>> http://titans:8080/solr/update/extract/?literal.versionId=684936&literal.filingDate=1997-12-04T00:00:00Z&literal.formTypeId=95&literal.companyId=3567904&literal.sourceId=0&resource.name=684936.txt&commit=false
>>>>
>>> Have you verified that all of your indexing jobs (you said you had 4
>>> or 5) have commit=false?
>>>
>>> Also make sure that your extract handler doesn't have a default of
>>> something that could cause a commit - like commitWithin or something.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>>>
>>> On Mon, Oct 5, 2009 at 12:44 PM, Giovanni Fernandez-Kincade
>>> <gfernandez-kincade@capitaliq.com> wrote:
>>>
>>>> Is there somewhere other than solrConfig.xml that the autoCommit feature
is enabled? I've looked through that file and found autocommit to be commented out:
>>>>
>>>>
>>>>
>>>> <!--
>>>>
>>>>  Perform a <commit/> automatically under certain conditions:
>>>>
>>>>         maxDocs - number of updates since last commit is greater than this
>>>>
>>>>         maxTime - oldest uncommited update (in ms) is this long ago
>>>>
>>>>    <autoCommit>
>>>>
>>>>      <maxDocs>10000</maxDocs>
>>>>
>>>>      <maxTime>1000</maxTime>
>>>>
>>>>    </autoCommit>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  -->
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Feak, Todd [mailto:Todd.Feak@smss.sony.com]
>>>> Sent: Monday, October 05, 2009 12:40 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: RE: Solr Timeouts
>>>>
>>>>
>>>>
>>>> Actually, ignore my other response.
>>>>
>>>>
>>>>
>>>> I believe you are committing, whether you know it or not.
>>>>
>>>>
>>>>
>>>> This is in your provided stack trace
>>>>
>>>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
SolrParams, boolean) org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
SolrQueryResponse)
>>>>
>>>>
>>>>
>>>> I think Yonik gave you additional information for how to make it faster.
>>>>
>>>>
>>>>
>>>> -Todd
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>
>>>> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kincade@capitaliq.com]
>>>>
>>>> Sent: Monday, October 05, 2009 9:30 AM
>>>>
>>>> To: solr-user@lucene.apache.org
>>>>
>>>> Subject: RE: Solr Timeouts
>>>>
>>>>
>>>>
>>>> I'm not committing at all actually - I'm waiting for all 6 million to be
done.
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>>
>>>> From: Feak, Todd [mailto:Todd.Feak@smss.sony.com]
>>>>
>>>> Sent: Monday, October 05, 2009 12:10 PM
>>>>
>>>> To: solr-user@lucene.apache.org
>>>>
>>>> Subject: RE: Solr Timeouts
>>>>
>>>>
>>>>
>>>> How often are you committing?
>>>>
>>>>
>>>>
>>>> Every time you commit, Solr will close the old index and open the new one.
If you are doing this in parallel from multiple jobs (4-5 you mention) then eventually the
server gets behind and you start to pile up commit requests. Once this starts to happen, it
will cascade out of control if the rate of commits isn't slowed.
>>>>
>>>>
>>>>
>>>> -Todd
>>>>
>>>>
>>>>
>>>> ________________________________
>>>>
>>>> From: Giovanni Fernandez-Kincade [mailto:gfernandez-kincade@capitaliq.com]
>>>>
>>>> Sent: Monday, October 05, 2009 9:04 AM
>>>>
>>>> To: solr-user@lucene.apache.org
>>>>
>>>> Subject: Solr Timeouts
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I'm attempting to index approximately 6 million HTML/Text files using SOLR
1.4/Tomcat6 on Windows Server 2003 x64. I'm running 64 bit Tomcat and JVM. I've fired up 4-5
different jobs that are making indexing requests using the ExtractionRequestHandler, and everything
works well for about 30-40 minutes, after which all indexing requests start timing out. I
profiled the server and found that all of the threads are getting blocked by this call to
flush the Lucene index to disk (see below).
>>>>
>>>>
>>>>
>>>> This leads me to a few questions:
>>>>
>>>>
>>>>
>>>> 1.       Is this normal?
>>>>
>>>>
>>>>
>>>> 2.       Can I reduce the frequency with which this happens somehow? I've
greatly increased the indexing options in SolrConfig.xml (attached here) to no avail.
>>>>
>>>>
>>>>
>>>> 3.       During these flushes, resource utilization (CPU, I/O, Memory Consumption)
is significantly down compared to when requests are being handled. Is there any way to make
this index go faster? I have plenty of bandwidth on the machine.
>>>>
>>>>
>>>>
>>>> I appreciate any insight you can provide. We're currently using MS SQL 2005
as our full-text solution and are pretty much miserable. So far SOLR has been a great experience.
>>>>
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Gio.
>>>>
>>>>
>>>>
>>>> http-8080-Processor21 [RUNNABLE] CPU time: 9:51
>>>>
>>>> java.io.RandomAccessFile.seek(long)
>>>>
>>>> org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[],
int, int)
>>>>
>>>> org.apache.lucene.store.BufferedIndexInput.refill()
>>>>
>>>> org.apache.lucene.store.BufferedIndexInput.readByte()
>>>>
>>>> org.apache.lucene.store.IndexInput.readVInt()
>>>>
>>>> org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
>>>>
>>>> org.apache.lucene.index.SegmentTermEnum.next()
>>>>
>>>> org.apache.lucene.index.SegmentTermEnum.scanTo(Term)
>>>>
>>>> org.apache.lucene.index.TermInfosReader.get(Term, boolean)
>>>>
>>>> org.apache.lucene.index.TermInfosReader.get(Term)
>>>>
>>>> org.apache.lucene.index.SegmentTermDocs.seek(Term)
>>>>
>>>> org.apache.lucene.index.DocumentsWriter.applyDeletes(IndexReader, int)
>>>>
>>>> org.apache.lucene.index.DocumentsWriter.applyDeletes(SegmentInfos)
>>>>
>>>> org.apache.lucene.index.IndexWriter.applyDeletes()
>>>>
>>>> org.apache.lucene.index.IndexWriter.doFlushInternal(boolean, boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.doFlush(boolean, boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.flush(boolean, boolean, boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.closeInternal(boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.close(boolean)
>>>>
>>>> org.apache.lucene.index.IndexWriter.close()
>>>>
>>>> org.apache.solr.update.SolrIndexWriter.close()
>>>>
>>>> org.apache.solr.update.DirectUpdateHandler2.closeWriter()
>>>>
>>>> org.apache.solr.update.DirectUpdateHandler2.commit(CommitUpdateCommand)
>>>>
>>>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(CommitUpdateCommand)
>>>>
>>>> org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
SolrParams, boolean)
>>>>
>>>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(SolrQueryRequest,
SolrQueryResponse)
>>>>
>>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(SolrQueryRequest,
SolrQueryResponse)
>>>>
>>>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(SolrQueryRequest,
SolrQueryResponse)
>>>>
>>>> org.apache.solr.core.SolrCore.execute(SolrRequestHandler, SolrQueryRequest,
SolrQueryResponse)
>>>>
>>>> org.apache.solr.servlet.SolrDispatchFilter.execute(HttpServletRequest, SolrRequestHandler,
SolrQueryRequest, SolrQueryResponse)
>>>>
>>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(ServletRequest, ServletResponse,
FilterChain)
>>>>
>>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ServletRequest,
ServletResponse)
>>>>
>>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ServletRequest,
ServletResponse)
>>>>
>>>> org.apache.catalina.core.StandardWrapperValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.core.StandardContextValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.core.StandardHostValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.valves.ErrorReportValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.core.StandardEngineValve.invoke(Request, Response)
>>>>
>>>> org.apache.catalina.connector.CoyoteAdapter.service(Request, Response)
>>>>
>>>> org.apache.coyote.http11.Http11Processor.process(InputStream, OutputStream)
>>>>
>>>> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(TcpConnection,
Object[])
>>>>
>>>> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(Socket, TcpConnection,
Object[])
>>>>
>>>> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(Object[])
>>>>
>>>> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run()
>>>>
>>>> java.lang.Thread.run()
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>
>
>


--
- Mark

http://www.lucidimagination.com




Mime
View raw message