lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sriram Sankar <san...@gmail.com>
Subject Re: NRT + static rank based sorting
Date Fri, 12 Jul 2013 19:55:31 GMT
Thanks!


On Tue, Jul 9, 2013 at 2:13 PM, Adrien Grand <jpountz@gmail.com> wrote:

> Hi Sriram,
>
> On Tue, Jul 9, 2013 at 5:06 AM, Sriram Sankar <sankar@gmail.com> wrote:
> > I've finally got something running and will send you some performance
> > numbers as promised shortly.  In the meanwhile, I've a question regarding
> > the use of real time indexing along with ordering by static rank.  Before
> > each search, I do the reopen as follows:
> >
> >     public void refresh() throws IOException {
> > DirectoryReader r = DirectoryReader.openIfChanged(reader);
> > if (r != null) {
> >     reader.close();
> >     reader = r;
> >     this.live = SortingAtomicReader.wrap(
> >                 new SlowCompositeReaderWrapper(reader),
> > new StaticRankSorter());
> > }
> >     }
> >
> > This works fine.  However, I believe the index is resorted everytime I
> > reopen the index.  Ideally, it would be nice to do the sort more
> > incrementally each time a new document gets added.  I assume that this is
> > not easy - but just in case you have ideas, I'd like to hear them.
>
> I think a good trade-off could be to fully collect the small segments
> that come from incremental updates. Since they are small, collecting
> them will be fast anyway. One the opposite, the bottleneck is likely
> the collection of large segments. This is why we chose to tackle the
> problem of online sorting using a merge policy (SortingMergePolicy).
> Segments are only sorted when merging, meaning that small NRT
> (flushed) segments won't be sorted but large (merged) segments will
> be.
>
> Then computing the top hits is just a matter of computing the best
> hits on every segment and merging them into a single hit list:
>  - for flushed segments, you need to fully collect them like Lucene
> does by default,
>  - for sorted segments, you can early-terminate collection on a
> per-segment basis when enough matchs have been collected.
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message