lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Lucene indexing throughput (and Mike's lucenebench charts)
Date Mon, 23 May 2016 16:31:17 GMT
I finally dug into this, and it turns out the nightly benchmark I run had
bad bottlenecks such that it couldn't feed documents quickly enough to
Lucene to take advantage of the concurrent hardware in beast2.

I fixed that and just re-ran the nightly run and it shows good gains:
https://plus.google.com/+MichaelMcCandless/posts/6mzSoY4ucFE

I suspect more gains are possible ... I need to play some more.

Mike McCandless

http://blog.mikemccandless.com

On Fri, Apr 15, 2016 at 12:43 PM, Robert Muir <rcmuir@gmail.com> wrote:

> you won't see indexing improvements there because the dataset in
> question is wikipedia and mostly indexing full text. I think it may
> have one measly numeric field.
>
> On Thu, Apr 14, 2016 at 6:25 PM, Otis Gospodnetić
> <otis.gospodnetic@gmail.com> wrote:
> > (replying to my original email because I didn't get people's replies,
> even
> > though I see in the archives people replied)
> >
> > Re BJ and beast2 upgrade.  Yeah, I saw that, but....
> > * if there is no indexing throughput improvement after that, does that
> mean
> > that those particular indexing tests happen to be disk bound and not CPU
> > bound? (I'm assuming beast2 has more cores than the previous hardware....
> > oh, I see, 72 cores vs. only 20 indexing threads)
> > * the metrics for GC times are sums across all CPUs, not averages per
> CPU?
> > Would the latter be more useful?
> >
> > What I was fishing for was something in that indexing chart that would
> show
> > me this little nugget:
> >
> > *Lucene 6 brings a major new feature called Dimensional Points: a new
> > tree-based data structure which will be used for numeric, date, and
> > geospatial fields. Compared to the existing field format, this new
> > structure uses half the disk space, is twice as fast to index, and
> > increases search performance by 25%.*
> >
> > How come the charts on
> > http://home.apache.org/~mikemccand/lucenebench/indexing.html don't show
> the
> > 2x faster indexing and various query performance charts don't show 25%
> > improvement in search performance?
> >
> > Thanks,
> > Otis
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> > On Thu, Apr 14, 2016 at 1:13 PM, Otis Gospodnetić <
> > otis.gospodnetic@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I was looking at Mike's
> >> http://home.apache.org/~mikemccand/lucenebench/indexing.html secretly
> >> hoping to spot some recent improvements in indexing throughput.... but
> >> instead it looks like:
> >>
> >> * indexing throughput hasn't really gone up in the last ~5 years
> >> * indexing was faster in 2014, but then dropped to pre-2014 speed in
> early
> >> 2015
> >> * indexing rate dropped some more in early 2016, and that seems to
> roughly
> >> correlate to a *big* jump in Young GC in late 2015
> >>
> >> Does anyone know what happened in late 2015 that causes that big Young
> GC
> >> jump?
> >> Or does that big jump just look scary in that chart, but is not
> actually a
> >> big concern in practice?
> >>
> >> Thanks,
> >> Otis
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message