lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giovanni Fernandez-Kincade <gfernandez-kinc...@capitaliq.com>
Subject Lucene Merge Threads
Date Mon, 12 Oct 2009 23:05:20 GMT
Hi,
I'm attempting to optimize a pretty large index, and even though the optimize request timed
out, I watched it using a profiler and saw that the optimize thread continued executing. Eventually
it completed, but in the background I still see a thread performing a merge:

Lucene Merge Thread #0 [RUNNABLE, IN_NATIVE] CPU time: 17:51
java.io.RandomAccessFile.readBytes(byte[], int, int)
java.io.RandomAccessFile.read(byte[], int, int)
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(byte[], int, int)
org.apache.lucene.store.BufferedIndexInput.refill()
org.apache.lucene.store.BufferedIndexInput.readByte()
org.apache.lucene.store.IndexInput.readVInt()
org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
org.apache.lucene.index.SegmentTermEnum.next()
org.apache.lucene.index.SegmentMergeInfo.next()
org.apache.lucene.index.SegmentMerger.mergeTermInfos(FormatPostingsFieldsConsumer)
org.apache.lucene.index.SegmentMerger.mergeTerms()
org.apache.lucene.index.SegmentMerger.merge(boolean)
org.apache.lucene.index.IndexWriter.mergeMiddle(MergePolicy$OneMerge)
org.apache.lucene.index.IndexWriter.merge(MergePolicy$OneMerge)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run()


This has taken quite a while, and hasn't really been fully utilizing the machine's resources.
After looking at the Lucene source, I noticed that you can set a MaxThreadCount parameter
in this class. Is this parameter exposed by Solr somehow? I see the class mentioned, commented
out, in my solrconfig.xml, but I'm not sure of the correct way to specify the parameter:

<!--
     Expert:
     The Merge Scheduler in Lucene controls how merges are performed.  The ConcurrentMergeScheduler
(Lucene 2.3 default)
      can perform merges in the background using separate threads.  The SerialMergeScheduler
(Lucene 2.2 default) does not.
     -->
    <!--<mergeScheduler>org.apache.lucene.index.ConcurrentMergeScheduler</mergeScheduler>-->


Also, if I can specify this parameter, is it safe to just start/stop my servlet server (Tomcat)
mid-merge?

Thanks in advance,
Gio.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message