lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lucene indexing speed on NVMe drive
Date Fri, 01 May 2015 13:46:14 GMT
Hyper-threading should help Lucene indexing go faster, when it's not
IO bound ... I found 20 threads (on 12 real cores, 24 with HT) to be
fastest in the nightly benchmark

But it's curious you're unable to saturate one of CPU or IO, with 20
real cores and NVMe storage.  200 GB/hour isn't that much better than
what we see on the nightly benchmark on 1 KB docs (~160 GB/hour),
though those cores are 3.33 Ghz (2 socket Intel Xeon x5680).

Where is the source line docs file stored?  Maybe pulling the lines
from it is a bottleneck?

Can you try running with 20 threads under a profiler and post the
results?  Or maybe capture thread stack for all threads multiple times
throughout the indexing run, so we can see where the threads are?
Might give a clue ...

Separately, you could turn on verbose to the Indexer, and IndexWriter
will produce lots of output about what happened ... maybe there is
something surprising, e.g. merges falling behind and stalling

The nightly index doesn't wait for merges to finish in the end by
default, but it could be if you change that, then you'd see speedups
rom NVMe.

Mike McCandless

On Thu, Apr 30, 2015 at 2:25 PM, Anahita Shayesteh-SSI
<> wrote:
> Hi. I am studying Lucene performance and in particular how it benefits from faster I/O
such as SSD and NVMe.
> I am using nightlybench for indexing wiki (1K docs) with similar parameters as used in
nightlyBench. (Hardware: Intel Xeon, 2.5GHz, 20 processor ,40 with hyperthreading, 64G Memory)
and study indexing speed on HDD, SSD and NVMe. While I do see benefit when switching from
HDD to SSD, there is not much noticeable benefit moving to NVMe.
> I get best performance (200GB/hour) with 20 indexing threads, increasing number of threads
to 40 hurts performance. Similarly increasing maxConcurrentMerges above 3-5 doesn't seem to
give me any benefit. I am wondering what the bottleneck is, or anyone has insight on  set
of options (number of threads, merge options, flush options, read buffer?) to take advantage
of a very fast I/O system. I see NVMe bandwidth going as high as 800MB/s but it is only fast
spikes and CPU utilization is about 50% on average, though some cores have consistently higher
utilization while others have spiky behavior.
> You thoughts and inside is greatly appreciated. Thanks.
> Anahita Shayesteh

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message