lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Query, Searcher, Weight, Similarity = ?
Date Fri, 29 Jun 2012 17:50:36 GMT
On Fri, Jun 29, 2012 at 9:02 AM, Arjun Dhar <dhar_ar@yahoo.com> wrote:
> Hi,
> I'm new and that is my disclaimer to the stupid question I am about to ask.
>
> Am trying to form a conceptual picture of the relation between Query <-->
> Weight <--> IndexReader, Scorer, Searcher <--> Similarity
>
> *From what I gather : (and someone please validate or correct me) *
> 1. We want *Queries* to be RE-USABLE instances hence *Weight* is a specific
> Queries state !?

Queries are independent of a Searcher. When executing a Query, it
creates a Weight specifically for that searcher. This contains things
things like IDF computations: collection-wide state.

> 2. *Searcher* is STATEFUL, and though it processes a *Query*, the state for
> that *Searcher* is delegated to the WEIGHT !?

Searcher wraps an indexreader (usually a composite indexreader
containing multiple segments like a DirectoryReader) to provide search
capabilities. It also has extension points that are search specific:
one of these is Similarity, but there are others. For example, in 4.0
you can override methods to provide collection-wide stats where the
collection is distributed: consisting of indexes across multiple
machines

> 3. *IndexReader* Reads an Index, and the *Searcher* uses the Reader to
> SEARCH, using a QUERY

yes.

> 4.  From the JavaDocs of Weight class ----> "IndexReader dependent state
> should reside in the Scorer. " -- Means, when *weights* are calculated, the
> final result of the Calculation goes into a STATEFUL object represented by
> the *Scorer* which is also Iterable !?

This could maybe be clarified to say per-segment state. So if you have
an IndexSearcher wrapping a DirectoryReader with 4 index segments, in
the typical case the Weight holds the state of the entire collection:
e.g. IDF across all 4 segments. The Weight creates 4 Scorers: a Scorer
for each segment in that DirectoryReader. Any per-segment information
such as the document length normalization ("norms") array resides in
each of those Scorers.

> 5. *Searcher* can be assigned a *Similarity* algorithm. ... hence using that
> algorithm, it calculates *Weight*, which eventually leads to the
> construction of an Iterable *Scorer* !?

A Similarity is a hook for term weighting. But term weighting is not
the entire scoring algorithm in many cases: Scorers don't have to use
Similarity to compute things: they can use whatever logic they want.

>
> 6. While Indexing, its simple there is a direct relation between
> IndexWriterConfig <--> Similarity

this is for computing document length normalization information
("norms") at indexing time. Currently thats the only way that
IndexWriter interacts with Similarity.

>
> +Q) Apart from the validation of my understanding, is there a Sequence
> Diagram explaining the process of calculation, during a Query?

have a look at https://builds.apache.org/job/Lucene-trunk/javadoc/ ,
click "Searching and Scoring in Lucene". I don't think there are any
diagrams there, but there is more information available.

>
> +Q) There are different implementations of Queries. Do they differ in how
> they mash up all the other stuff?
> Looks like if i mess each of the other entities, I can pretty much produce
> whatever Query?!

See the link above for more information, especially the section on
writing custom queries.

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message