lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: SortingAtomicReader alternate to Tim-Sort...
Date Tue, 28 Apr 2015 12:33:47 GMT
On Tue, Apr 21, 2015 at 10:00 AM, Ravikumar Govindarajan
<ravikumar.govindarajan@gmail.com> wrote:
> Thanks for the comments…
>
> My only
>> concern about using the FixedBitSet is that it would make sorting each
>> postings list run in O(maxDoc) but maybe we can make it better by
>> using SparseFixedBitSet
>
>
> Yes I was also thinking about this. But we are on 4.x and did not take the
> plunge. But as you said, it should be a good idea to test on SFBS

Would you like to submit a patch that changes SortingMergePolicy to
use the approach that you are proposing using bitsets instead of
sorting int[] arrays?

> I'm curious if you already performed any kind of benchmarking of this
>> approach?
>
>
> Yes we did a stress test of sorts aimed at SortingMergePolicy. We made most
> of our data as RAM resident and then CPU hot-spots came up...
>
> There were few take-aways from the test. I am listing down few of them..
> It's kind of lengthy. Please read through...
>
> a) Postings-List issue, as discussed above…
>
> b) CompressingStoredFieldsReader did not store the last decoded 32KB chunk.
> Our segments are already sorted before participating in a merge. On mostly
> linear merge, we ended up decoding the same chunk again and again. Simply
> storing the last chunk resulted in good speed-ups for us...
>
> c) Once above steps were corrected, the CPU hotspot shifted to
> BlockDocsEnum. Here most of our postings-list < 128 docs. So
> Lucene41Postings started using vInts…  I did try ForUtil encoding even for
> < 128 docs. It definitely went easy on CPU. But failed to measure resulting
> file-size increase.
>
> I realised not just SMP but any other merge must face the same issue and
> left it at that..

True. Like Robert said, there has been work done on b) already and I
think we can move forward on a) too. Thanks for sharing your findings!

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message