lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: block min-max values for Sort Field with Top-N query..
Date Wed, 03 Jul 2019 05:51:57 GMT
Thanks Mikhail & Adrien for the help

This is the same principle that we apply for block-max WAND so
> theoretically that would work, though in practice it might be a bit
> hard to implement due to the fact that we don't have the APIs that you
> will need.


Aah, did not know block-max WAND is now in lucene! So what I am proposing
looks identical to Bm-WAND..

The heavy-lifting is already done in lucene codebase. Think it should be
straight-forward for us to wrap DocValues in a CustomCodec to track block
min-max ords. We shall give this a shot anyways & see how it goes

Directly index the field into as a term frequency instead of doc
> values, e.g. using FeatureField. One downside is that you can only
> sort in one order efficiently.
>

Thanks for suggestion. Sure will try & dabble with FeatureField too!

--
Ravi

On Tue, Jul 2, 2019 at 6:52 PM Adrien Grand <jpountz@gmail.com> wrote:

> Hello,
>
> This is the same principle that we apply for block-max WAND so
> theoretically that would work, though in practice it might be a bit
> hard to implement due to the fact that we don't have the APIs that you
> will need.
>
> I have considered the idea of adding information about blocks to doc
> values a couple times, but I think it'd be better to either:
>  - Directly index the field into as a term frequency instead of doc
> values, e.g. using FeatureField. One downside is that you can only
> sort in one order efficiently.
>  - Or using LongDistanceFeatureQuery if your field is also indexed
> with points, by passing the max value of your index as the "origin" if
> you want to sort in decreasing order and the min value if you want to
> sort in increasing order. This would be a bit less efficient than
> FeatureField but would allow sorting in either ascending or descending
> order.
>
>
>
> On Tue, Jul 2, 2019 at 3:01 PM Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> >
> > Our Sort Fields utilize DocValues..
> >
> > Lets say I collect min-max ords of a Sort Field for a block of documents
> > (128, 256 etc..) at index-time via Codec & store it as part of DocValues
> at
> > a Segment level..
> >
> > During query time, could we take advantage of this Stats when Top-N query
> > with Sort Field is requested?
> >
> > Typically, what I had in mind is a SortStats class with the following
> method
> >
> > int *seek*(int *max-doc-seen-till-now*, int *min-sort-ord-seen-till-now*,
> > boolean sortDesc) {
> >   // 1. Fetch the doc-ranges that has >=
> > *min-sort-ord-seen-till-now*
> > *  // 2. *Return the least doc-range >= *max-doc-seen-till-now *(If
> > SortDesc=true)
> > *         Return the least doc-range <= max-doc-seen-till-now *(If
> > SortDesc=false)
> > }
> >
> > Top-N Collector can keep track of the *max-doc-seen-till-now &
> > min-sort-ord-seen-till-now *variable during query time & then call the
> > *SortStats.seek()* for a possible skip of blocks of documents that may
> > otherwise be needlessly offered & popped out from the priority queue
> >
> > I understand this simplistic logic depends on sort-field data
> distribution
> > & won't work for multi-sort field queries or out-of-order scoring etc..
> >
> > But, in general will this be a good idea to explore or something that is
> > best not attempted?
> >
> > Any help is much appreciated
> >
> > --
> > Ravi
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message