lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: A lot of short documents, optimal query?
Date Fri, 11 Nov 2005 22:04:56 GMT

: Wouldn't it make sense to have BooleanFilter,
: TermFilter, MultiTermFilter, RangeFilter... fammily to
: "mirror"  xxxQuery world with same idioms and
: interfaces? Is this the direction allready taken in
: Lucene development (an alternative would be to
: parametrize existiong Query world). How I see it
: functionaly, at a moment filters (and thir
: combination) are the only way to use fast "pure
: boolean" model.
: Does this what I just said makes any sense?

It makes perfect sense, and you have grasped a ot of the possibilities.
While making a version of Filter varient of every Query class is my gut
instinct, there has in fact been discussion about generalizing Queries so
that they can have "non-scoring" mode.  these issues have all been
mentioned in LUCENE-383 ...

One of the big reasons why it might make sense to use Queries instead of
Filters even if you don't care about scoring is when you have a large set
of very restrictive conditions.  (ie: A BooleanQuery consisting of many
TermQueries).  the BooleanScorer can make good decisisons to skip over
large sets of documents -- sometimes ignoring sub queries entirely -- when
one sub query only matches a few documents because of hte flexability of
the Scorer API.

The Filter API on the other hand doesn't have this flexability.  There is
not way for a ChainedFilter/BooleanFilter to know that it can skip over
one of it's sub filters, or ask one of it's sub filters to only look at
certain documents.

I suggested the full Filter approach for your situation based on the
following information...
  1) you didn't care about scoring
  2) you were using Range/Prefix queries on teh ZIP field that could
     easily exceed practicle clause limits in BooleanQuery.
  3) your restrictions on the ZIP field looked like they could be cached
     individually so the and the results reused accross many searches


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message