lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen" <cdor...@gmail.com>
Subject Re: Query processing with Lucene
Date Tue, 08 Jan 2008 19:29:46 GMT
Hi Marjan,

Lucene process the query in what can be called
one-doc-at-a-time.

For the example query - x y - (not the phrase query "x y") - all
documents containing either x or y are considered a match.

When processing the query - x y - the posting lists of these two
index terms are traversed, and for each document met on the way,
a score is computed (taking into account both terms), and "collected".
At the end of the traversal, usually best N collected docs are returned as
search result. So, this is an exhaustive computation creating a union of
the two posting. For the query - +x +y - in intersection rather than
union is required, and the way Lucene does it is again to traverse
the two posting lists, just that only documents seen in both lists
are scored and collected. This allows to optimize the search,
skipping large chunks of the posting lists, especially when
one term is rarer than the other.

You can read more on Lucene scoring in Lucene's documentation,
http://lucene.apache.org/java/docs/scoring.html is a good starting
point,

HTH,
Doron

On Jan 6, 2008 2:13 PM, Marjan Celikik <celikik@gmail.com> wrote:

> Dear all,
>
> Maybe this topic is already discussed (then can I get a reference
> please?)... I would like to know how does Lucene actually process the
> query. For example, take a 2-word query "x y". Does Lucene fetch the
> lists of "x" and "y" and intersect them, or do they do something more
> fancy, for example, top-k techniques that try to avoid a full scan of
> the index lists for "x" and "y" ?
>
> Marjan.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message