lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjoy Das <san...@playingwithpointers.com>
Subject Re: Benchmarking Lucene
Date Mon, 23 Nov 2015 20:23:44 GMT


Michael McCandless wrote:
 > Which JVM vendor :)  There are not so many, unfortunately...

I work for Azul Systems (https://www.azul.com).

 > I run nightly benchmarks for Lucene, which are visible at
 > https://people.apache.org/~mikemccand/lucenebench/
 >
 > We use this to catch accidental performance regressions... the sources
 > for all of this are at https://github.com/mikemccand/luceneutil but
 > running them yourself can be tricky.  They index and search
 > Wikipedia's English export.

I was hoping to get hold of benchmarks that are a little more
"lightweight" -- something that I can run from beginning to end in <
30 minutes.  Is there an interesting subset of the nightly tests that
I can run within that sort of timeframe?

 > Lucene is definitely JVM/GC bound in many cases, e.g. when the index
 > is "hot" (fully cached by the OS in free RAM).
 >
 > I'm not familiar with Dacapo...
 >
 > I'm not sure how aggressively users upgrade ... but I believe most
 > users use Lucene via Elasticsearch or Solr.
 >
 > Mike McCandless
 >
 > http://blog.mikemccandless.com
 >
 >
 > On Mon, Nov 23, 2015 at 2:42 PM, Sanjoy Das
 > <sanjoy@playingwithpointers.com>  wrote:
 >> Hi all,
 >>
 >> I work for a JVM vendor, and we're interested in obtaining / creating
 >> a set of Lucene benchmarks for internal use.  We plan to use these for
 >> performance regression testing and general performance analysis
 >> (i.e. to make sure Lucene performs well on our JVM).  I'm especially
 >> interested in benchmarks that demonstrate opportunities for
 >> improvements in our JIT compiler.
 >>
 >> While I imagine that the lucene/benchmark/ directory is probably the
 >> right place to start, I have a few high-level questions that are best
 >> answered by people on this mailing list:
 >>
 >> - Are there realistic Lucene workloads that are bottle-necked on the
 >>    JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO?
 >>    If so, what are some examples?
 >>
 >> - How relevant are the Dacapo "luindex" and "lusearch" benchmarks
 >>    today?  Will porting them to the latest version of Lucene give me a
 >>    benchmark representative of modern Lucene usage, or has Lucene's
 >>    performance characteristics evolved in fundamental ways since Dacapo
 >>    was published?
 >>
 >> - What is the distribution of Lucene versions in production
 >>    deployments?  Do users tend to aggressively upgrade to the "latest
 >>    and greatest" Lucene version, or is there usually a non-trivial lag?
 >>
 >> Any other information that you think is useful or relevant is
 >> welcome.
 >>
 >> Thanks!
 >> -- Sanjoy
 >>
 >> ---------------------------------------------------------------------
 >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
 >> For additional commands, e-mail: dev-help@lucene.apache.org
 >>

Mime
View raw message