lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Query Term Collector (was: Re: New highlighter package available)
Date Sun, 05 Oct 2003 09:15:14 GMT
Here are some very important reasons why getTerms() shouldn't be added as a method to Query:

Query objects are seen by Lucene users as reusable objects.

Eg they could be used as routing queries which are run repeatedly to classify incoming documents.

They are are re-usable across multiple indexes and index versions ie they hold no state about

specific indexes. Thats the current contract.

If you decided to slap a method called getTerms() on a query which returns expansions of multi-terms

that is adding state which effectively ties the Query instance to a particular index and a
snapshot of that index's content, rendering the query unreusable.

It is useful to think of Queries in two forms:

1) High-level, reusable, index-and index-version independent objects (returned by QueryParser)
2) Targetted queries associated with a particular version of an index, used briefly then discarded.

Now. Type 2 ("targetted") is the query returned by query.rewrite(reader) and was until recently
exclusively by the search process and subsequently thrown away.

The new highlighting code also requires the use of "targetted queries" but it is not possible
to get
hold of the targetted query that is the by-product of the search. This is why the caller is
to create a "targetted" query by calling rewrite THEN calling the search and highlight functions
this version.

These query types are important distinctions to preserve and the getTerms() proposal 
doesn't respect these subtle differences in query usage.


>>I looked at your code quickly, can you confirm that the following scenario is what

>>happens when you run a search with MultiTermQuery? 

Not true any more. I think you're looking at outdated code.
See my recent post which described how I ripped out the rewrite calls in the latest highlighter
and made
it the caller's responsibility:
As for "prohibited" - note the highlighter takes a "prohibited" parameter too.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message