lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: SortingAtomicReader alternate to Tim-Sort...
Date Thu, 30 Apr 2015 07:02:04 GMT
>
> Would you like to submit a patch that changes SortingMergePolicy to
> use the approach that you are proposing using bitsets instead of
> sorting int[] arrays?


Sure can do that. Can you open a ticket for this, as I don't know what
versions this can go in?

--
Ravi



On Tue, Apr 28, 2015 at 6:03 PM, Adrien Grand <jpountz@gmail.com> wrote:

> On Tue, Apr 21, 2015 at 10:00 AM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > Thanks for the comments…
> >
> > My only
> >> concern about using the FixedBitSet is that it would make sorting each
> >> postings list run in O(maxDoc) but maybe we can make it better by
> >> using SparseFixedBitSet
> >
> >
> > Yes I was also thinking about this. But we are on 4.x and did not take
> the
> > plunge. But as you said, it should be a good idea to test on SFBS
>
> Would you like to submit a patch that changes SortingMergePolicy to
> use the approach that you are proposing using bitsets instead of
> sorting int[] arrays?
>
> > I'm curious if you already performed any kind of benchmarking of this
> >> approach?
> >
> >
> > Yes we did a stress test of sorts aimed at SortingMergePolicy. We made
> most
> > of our data as RAM resident and then CPU hot-spots came up...
> >
> > There were few take-aways from the test. I am listing down few of them..
> > It's kind of lengthy. Please read through...
> >
> > a) Postings-List issue, as discussed above…
> >
> > b) CompressingStoredFieldsReader did not store the last decoded 32KB
> chunk.
> > Our segments are already sorted before participating in a merge. On
> mostly
> > linear merge, we ended up decoding the same chunk again and again. Simply
> > storing the last chunk resulted in good speed-ups for us...
> >
> > c) Once above steps were corrected, the CPU hotspot shifted to
> > BlockDocsEnum. Here most of our postings-list < 128 docs. So
> > Lucene41Postings started using vInts…  I did try ForUtil encoding even
> for
> > < 128 docs. It definitely went easy on CPU. But failed to measure
> resulting
> > file-size increase.
> >
> > I realised not just SMP but any other merge must face the same issue and
> > left it at that..
>
> True. Like Robert said, there has been work done on b) already and I
> think we can move forward on a) too. Thanks for sharing your findings!
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message