lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arvind Kalyan <bas...@gmail.com>
Subject Re: Merging ordered segments without re-sorting.
Date Wed, 23 Oct 2013 20:19:28 GMT
Thanks again.

Sorting is not an option for our case so we will most likely implement a
variant that merges the segments in one pass. Using TimSort is great but in
our case the 2 segments will be highly interspersed and would not benefit
from the galloping in TimSort.

In additional, if anyone else on the list has any inputs on implementing
the merge (without sort) I'd appreciate it as well! More than likely I'll
have followup questions if we decide to go this route.


On Wed, Oct 23, 2013 at 12:56 PM, Shai Erera <serera@gmail.com> wrote:

> SortingAtomicReader uses the TimSort algorithm, which performs well when
> the two segments are already sorted.
> Anyway, that's the way to do it, even if it looks like it does more work
> than it should.
>
> Shai
>
>
> On Wed, Oct 23, 2013 at 10:46 PM, Arvind Kalyan <base16@gmail.com> wrote:
>
> > Thanks, my understanding is that SortingMergePolicy performs sorting
> after
> > wrapping the 2 segments, correct?
> >
> > As I mentioned in my original email I would like to avoid the re-sorting
> > and exploit the fact that the input segments are already sorted.
> >
> >
> >
> > On Wed, Oct 23, 2013 at 11:02 AM, Shai Erera <serera@gmail.com> wrote:
> >
> > > Hi
> > >
> > > You can use SortingMergePolicy and SortingAtomicReader to achieve that.
> > You
> > > can read more about index sorting here:
> > > http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html
> > >
> > > Shai
> > >
> > >
> > > On Wed, Oct 23, 2013 at 8:13 PM, Arvind Kalyan <base16@gmail.com>
> wrote:
> > >
> > > > Hi there, I'm looking for pointers, suggestions on how to approach
> this
> > > in
> > > > Lucene 4.5.
> > > >
> > > > Say I am creating an index using a sequence of addDocument() calls
> and
> > > end
> > > > up with segments that each contain documents in a specified ordering.
> > It
> > > is
> > > > guaranteed that there won't be updates/deletes/reads etc happening on
> > the
> > > > index -- this is an offline index building task for a read-only
> index.
> > > >
> > > > I create the index in the above mentioned fashion
> > > > using LogByteSizeMergePolicy and finally do a forceMerge(1) to get a
> > > single
> > > > segment in the ordering I want.
> > > >
> > > > Now my requirement is that I need to be able to merge this single
> > segment
> > > > with another such segment (say from yesterday's index) and guarantee
> > some
> > > > ordering -- say I have a comparator which looks at some field values
> in
> > > the
> > > > 2 given docs and defines the ordering.
> > > >
> > > > Index 1 with segment X:
> > > > (a,1)
> > > > (b,2)
> > > > (e,10)
> > > >
> > > > Index 2 (say from yesterday) with some segment Y:
> > > > (c,4)
> > > > (d,6)
> > > >
> > > > Essentially we have 2 ordered segments, and I'm looking to 'merge'
> them
> > > > (literally) using the value of some field, without having to re-sort
> > them
> > > > which would be too time & resource consuming.
> > > >
> > > > Output Index, with some segment Z:
> > > > (a,1)
> > > > (b,2)
> > > > (c,4)
> > > > (d,6)
> > > > (e,10)
> > > >
> > > > Is this already possible? If not, any tips on how I can approach
> > > > implementing this requirement?
> > > >
> > > > Thanks,
> > > >
> > > > --
> > > > Arvind Kalyan
> > > >
> > >
> >
> >
> >
> > --
> > Arvind Kalyan
> > http://www.linkedin.com/in/base16
> > cell: (408) 761-2030
> >
>



-- 
Arvind Kalyan
http://www.linkedin.com/in/base16
cell: (408) 761-2030

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message