lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Lucene indexing throughput (and Mike's lucenebench charts)
Date Fri, 15 Apr 2016 16:43:28 GMT
you won't see indexing improvements there because the dataset in
question is wikipedia and mostly indexing full text. I think it may
have one measly numeric field.

On Thu, Apr 14, 2016 at 6:25 PM, Otis Gospodnetić
<otis.gospodnetic@gmail.com> wrote:
> (replying to my original email because I didn't get people's replies, even
> though I see in the archives people replied)
>
> Re BJ and beast2 upgrade.  Yeah, I saw that, but....
> * if there is no indexing throughput improvement after that, does that mean
> that those particular indexing tests happen to be disk bound and not CPU
> bound? (I'm assuming beast2 has more cores than the previous hardware....
> oh, I see, 72 cores vs. only 20 indexing threads)
> * the metrics for GC times are sums across all CPUs, not averages per CPU?
> Would the latter be more useful?
>
> What I was fishing for was something in that indexing chart that would show
> me this little nugget:
>
> *Lucene 6 brings a major new feature called Dimensional Points: a new
> tree-based data structure which will be used for numeric, date, and
> geospatial fields. Compared to the existing field format, this new
> structure uses half the disk space, is twice as fast to index, and
> increases search performance by 25%.*
>
> How come the charts on
> http://home.apache.org/~mikemccand/lucenebench/indexing.html don't show the
> 2x faster indexing and various query performance charts don't show 25%
> improvement in search performance?
>
> Thanks,
> Otis
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
> On Thu, Apr 14, 2016 at 1:13 PM, Otis Gospodnetić <
> otis.gospodnetic@gmail.com> wrote:
>
>> Hi,
>>
>> I was looking at Mike's
>> http://home.apache.org/~mikemccand/lucenebench/indexing.html secretly
>> hoping to spot some recent improvements in indexing throughput.... but
>> instead it looks like:
>>
>> * indexing throughput hasn't really gone up in the last ~5 years
>> * indexing was faster in 2014, but then dropped to pre-2014 speed in early
>> 2015
>> * indexing rate dropped some more in early 2016, and that seems to roughly
>> correlate to a *big* jump in Young GC in late 2015
>>
>> Does anyone know what happened in late 2015 that causes that big Young GC
>> jump?
>> Or does that big jump just look scary in that chart, but is not actually a
>> big concern in practice?
>>
>> Thanks,
>> Otis
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message