lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjoy Das <>
Subject Re: Benchmarking Lucene
Date Mon, 23 Nov 2015 20:23:44 GMT

Michael McCandless wrote:
 > Which JVM vendor :)  There are not so many, unfortunately...

I work for Azul Systems (

 > I run nightly benchmarks for Lucene, which are visible at
 > We use this to catch accidental performance regressions... the sources
 > for all of this are at but
 > running them yourself can be tricky.  They index and search
 > Wikipedia's English export.

I was hoping to get hold of benchmarks that are a little more
"lightweight" -- something that I can run from beginning to end in <
30 minutes.  Is there an interesting subset of the nightly tests that
I can run within that sort of timeframe?

 > Lucene is definitely JVM/GC bound in many cases, e.g. when the index
 > is "hot" (fully cached by the OS in free RAM).
 > I'm not familiar with Dacapo...
 > I'm not sure how aggressively users upgrade ... but I believe most
 > users use Lucene via Elasticsearch or Solr.
 > Mike McCandless
 > On Mon, Nov 23, 2015 at 2:42 PM, Sanjoy Das
 > <>  wrote:
 >> Hi all,
 >> I work for a JVM vendor, and we're interested in obtaining / creating
 >> a set of Lucene benchmarks for internal use.  We plan to use these for
 >> performance regression testing and general performance analysis
 >> (i.e. to make sure Lucene performs well on our JVM).  I'm especially
 >> interested in benchmarks that demonstrate opportunities for
 >> improvements in our JIT compiler.
 >> While I imagine that the lucene/benchmark/ directory is probably the
 >> right place to start, I have a few high-level questions that are best
 >> answered by people on this mailing list:
 >> - Are there realistic Lucene workloads that are bottle-necked on the
 >>    JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO?
 >>    If so, what are some examples?
 >> - How relevant are the Dacapo "luindex" and "lusearch" benchmarks
 >>    today?  Will porting them to the latest version of Lucene give me a
 >>    benchmark representative of modern Lucene usage, or has Lucene's
 >>    performance characteristics evolved in fundamental ways since Dacapo
 >>    was published?
 >> - What is the distribution of Lucene versions in production
 >>    deployments?  Do users tend to aggressively upgrade to the "latest
 >>    and greatest" Lucene version, or is there usually a non-trivial lag?
 >> Any other information that you think is useful or relevant is
 >> welcome.
 >> Thanks!
 >> -- Sanjoy
 >> ---------------------------------------------------------------------
 >> To unsubscribe, e-mail:
 >> For additional commands, e-mail:

View raw message