lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <>
Subject Re: Iterators for collecting Terms from Queries
Date Fri, 14 Mar 2003 17:17:05 GMT
hi Tatu,
i didn't really look at all the code, but at a first looks nice, i like the idea, but i have
something to say.
When we run a search, we collect all the terms already (in a previous email i mentioned something
about that, see "rewrite" method), your idea is very elegant from a programming-style point
of view, but i believe it slow down performance compared to mine. My idea doesn't add any
extra class, or the most it can is just one, a lot of changes are done in the lucene core,
so the main difference can be seen as follow:
My case
1) set a boolean value inside Query class to true: collectTerms(true);
2) run the search
3) the searcher (reader actually) will call the method "rewrite" or some other methods, inside
this method we check if the user want collect the terms testing the public boolean collectTerms.
This is to avoid consumption of un-necessary memory by a user that doesn't need to collect
the terms.
4) terms are now in memory in different query classes, depends on the "user query", e.g.:
a boolean query of 2 multitermquery. so the user can collect them the way he wants and use
them. e.g.: i collect them and store in an array of Clauses, someone may just want to put
in an array.

Your case:
1) run the search
2) the searcher collect all the terms because it needs due to produce rsearch results.
3) use your term collector to collect the terms. ATTN: this will do something that has been
done already by the searcher! so, i think it is a waste of resources and time, and as result
performance slows down.

I want underline that the time to put an object in an array and get it back is still the same,
the difference is call the reader twice instead of one.
I am not sure how much is the difference between the two cases, but for logic i think there
has to be, even more when we dial with prefixquery or rangequery (that's where mainly we need
the collector actually!).
It may sounds weird, but i lost all the data on my pc, this monday, so i can't compare them,
also i have to implement my idea again..

Let me know what you think,

Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message