lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Query processing with Lucene
Date Wed, 09 Jan 2008 14:49:16 GMT
On Tuesday 08 January 2008 22:49:18 Doron Cohen wrote:
> This is done by Lucene's scorers. You should however start
> in http://lucene.apache.org/java/docs/scoring.html, - scorers
> are described in the "Algorithm" section. "Offsets" are used
> by Phrase Scorers and by Span Scorer.

That is for the case that offsets were meant to be positions
within a document.

It is also possible that offsets were meant in the sense of using
skipTo(doc) instead of next() on a Scorer. This is done during
query search when at least one term is required.

Regards,
Paul Elschot


> 
> Doron
> 
> On Jan 8, 2008 11:24 PM, Marjan Celikik < celikik@gmail.com> wrote:
> 
> > Doron Cohen wrote:
> > > Hi Marjan,
> > >
> > > Lucene process the query in what can be called
> > > one-doc-at-a-time.
> > >
> > > For the example query - x y - (not the phrase query "x y") - all
> > > documents containing either x or y are considered a match.
> > >
> > > When processing the query - x y - the posting lists of these two
> > > index terms are traversed, and for each document met on the way,
> > > a score is computed (taking into account both terms), and "collected".
> > > At the end of the traversal, usually best N collected docs are returned
> > as
> > > search result. So, this is an exhaustive computation creating a union of
> > > the two posting. For the query - +x +y - in intersection rather than
> > > union is required, and the way Lucene does it is again to traverse
> > > the two posting lists, just that only documents seen in both lists
> > > are scored and collected. This allows to optimize the search,
> > > skipping large chunks of the posting lists, especially when
> > > one term is rarer than the other.
> > >
> > Thank you for your answer.
> >
> > I am having trouble finding the function which traverses the documents
> > such that they get scored. Can you
> > please tell me where the posting lists (for a +x +y query) get
> > intersected after they get read (by next() I guess)
> > from the index?
> >
> > In particular, I am interested in how does Lucene get the new positions
> > (offsets) of the documents seen
> > in both posting lists, i.e. positions (in a document) for the query word
> > x, and positions for the query word y.
> >
> > Thank you in advance!
> >
> > Marjan.
> >
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message