lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <>
Subject Re: Query Term Collector (was: Re: New highlighter package available)
Date Sat, 04 Oct 2003 19:48:29 GMT
Hi Mark,
I looked at your code quickly, can you confirm that the following scenario is what happens
when you run a search with MultiTermQuery? 

-construct your query manually or using QueryParser
-run a search using indexsearcher
-searcher will collect all the terms using "rewrite(IndexReader reader)"
-for each document you have need to get the terms (highlight usually) you will call:
--- getTerms(Query query, HashSet terms, boolean prohibited), and because it is a MultiTermQuery,
i see you need to call againg "reader.rewrite()".

IF my assumption is correct seems to me there are some resources wasted because the method
rewrite has been called already by the searcher.
That's why i added getTerms() and a few ArrayList to hold them in the current instance of
the query. In my case each user-search creates a new Query so those array will be released
at the end. 

Of course Mark there is space for improvements, i agree with you about found a "home" for
getTerms(), and actually the home is there ! but we don't have the keys! private Vector clauses
= new Vector(); holds almost the same values my arraylist does, the difference is, prohibited
Clause are there as well.
Also, i had to make it working as fast as possible and as good as possible in a short time,
now that you opened my mind i believe the method getTerms could get them from the clauses
vector inside BooleanQuery.
May be a boolean prohibited could be passed as parameter to "skip" these clauses (would save
some work to highlighters).
I still believe that Query should have an abstract method getTerms(..) otherwise we should
switch case between different query type to get them, a common way it alsways better, my opinion.

Thank you,
Ciao Korfut.

--------- Original Message ---------

DATE: Sat, 4 Oct 2003 09:47:05 

>With regards to Korfut's TermCollector proposition:
>I do not like the new requirement for all query classes to implement getTerms(). This
is effectively what they are currently
>required to do in the query.rewrite() method - express their high-level logic in primitive
>I beleive the getTerms() implementation should make use of this existing feature of all
query objects (as I have done in
>, and not create a new set of requirements for all query
classes - lets not add complexity where its
>not needed.
>So, I think the real question is should there be a home for a getTerms() function that
operates on primitive (rewritten) queries?
>We can move some of the logic in to somewhere core if the
consensus is that 
>this is a generally useful feature (though I have yet to think of one outside of highlighting)
>Incidentally, it may be of interest to note that I am busy packaging up a getTopTerms()
feature that analyses the contents 
>of query result sets and returns the "significant" terms and phrases found in the result
set based on their relative frequency
>compared to that of the corpus. 
>Its quite effective and of use in query expansion and highlighting. 
>This may be of interest to those proposing query.getTerms() changes.
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message