lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomer Gabel <>
Subject Re: Performance of never optimizing
Date Wed, 05 Nov 2008 14:47:04 GMT

Justus Pendleton-2 wrote:
> 1. Why does the merge factor of 4 appear to be faster than the merge  
> factor of 2?
> 2. Why does non-optimized searching appear to be faster than optimized  
> searching once the index hits ~500,000 documents?
> 3. There appears to be a fairly sizable performance drop across the  
> board around 450,000 documents. Why is that?

Hi Justus,

1. Higher merge factor => more segments. Lucene (which version are you
using, by the way?) only keeps a single file handle per physical file per
index reader; if your benchmark is multi-threaded, more concurrently active
segments would mean more file handles. Since you're using an 8-core Mac Pro
I also assume you have some sort of RAID setup, which means your storage
subsystem can physically handle more than one concurrent request, which can
only come into play with multiple segments.

2. Same explanation as above - an optimized index has only one segment, and
contention on the file handle can actually becomes a bottleneck past a
certain threshold. A merge factor of 2 leaves you with very few segments
even for a non-optimized index, which is why the performance of a
non-optimized, 2-factor index is very close to that of the optimized index.
The optimal merge-factor in this case will probably be a function of the
complexity of your RAID setup (NAS devices can easily utilize dozens of
physical drives, giving a measurable benefit to multiple concurrently active
segments), but I expect your setup won't seriously benefit from an increase
in the merge factor because it probably uses 4 or less physical drives. 

3. This is trickier; my guess is that until that point most of the
term-frequency data (.frq) is small enough to be kept fully in the disk read
cache, and beyond that point considerably more I/O is actually performed by
the storage subsystem. This can be probably be measured with tools available
in the OS of your choice, if you wish to corroborate this theory (I'd
certainly be interested in the results).

Best of luck,

-- Tomer Gabel 

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message