lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: about merge factor
Date Sun, 11 Feb 2007 20:14:18 GMT
Found a mistake in my reponse...when I was talking about max merge docs, 
I meant max buffered docs. If your going to optimize anyway, the key 
setting appears to be max buffered docs, and I have yet to see the merge 
factor affect anything (again, only if you optimize). Oddly, performance 
seems to decrease as you up max buffered docs far before you are even 
close to running out of available ram. I do not know why this is, but 
you should certainly test to see what your prime settings are.

Also, the knew benchmarking stuff is awesome.

- Mark

Grant Ingersoll wrote:
> You may find contrib/Benchmark useful in your testing.  Doron Cohen 
> has added a nice framework for scripting benchmarking tests.
> -Grant
> On Feb 11, 2007, at 12:14 PM, Mark Miller wrote:
>> Not sensible at all. First, a merge above something like 90 most 
>> likely never makes since. Second, I have done some testing and my 
>> results show that if you optimize the index after loading, the merge 
>> factor really doesn't matter so keep it at 10 (I never used a max 
>> merge docs below 50. 100 worked best, 1,000 and 2,000 slowed things 
>> down even though the test had access to 600MB RAM and the docs where 
>> around 10-20k each). Setting up a test harness that automatically 
>> indexes a good amount of docs (I did 20,000) with a variety of 
>> settings will tell you a lot. Things will obviously bend based on 
>> your setup.
>> - Mark
>> maureen tanuwidjaja wrote:
>>> Hi all,
>>>     I just wondering wheter is it sensible and possible if I have 
>>> 660,000  documents to be indexed,I set the merge factor to 660,000 
>>> instead of  the default value 10 (...and this means no merge while 
>>> indexing) and  later after closing the index,I use the IndexWriter 
>>> to optimize/merge  the whole index file...
>>>       Thanks and Regards,
>>>   Maureen
>>>    ---------------------------------
>>> We won't tell. Get more on shows you hate to love
>>> (and love to hate): Yahoo! TV's Guilty Pleasures list.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> Read the Lucene Java FAQ at 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message