lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: discontinuous range query
Date Thu, 05 Oct 2006 19:13:12 GMT

: It's clear that my problem here comes from a lack of understanding of
: the semantics of SHOULD, MUST, and MUST_NOT.
: I haven't found a clear description of this (except for a brief
: comment here
: Most of the other descriptions I have found discuss it as if it they
: were boolean operators, which they aren't, quite.

there are most certianly not boolean operators ... the fact that
BooleanQuery has "Boolean" in it's name, and the fact that QueryParser
*tries* to support boolean operators like "AND" and "OR" are the top two
gripes i have about Lucene.

I hadn't seen that post by Doug that you linked ot, but it is dead on.

In my opinion, the best way to think about MUST, MUST_NOT, and SHOULD is
to start by forcing yourself to remember that a query is not just about
selecting matchings -- it's about scoring those matches.

The second important thing when looking at "nested" BooleanQueries, is
that you have to view each level as it's own entity.

when you have something like:   (A +B) -(C D -E) +(F -G)

look at each of the "second level" queries in isolation:

  A +B   ... docs must match B, if they match A their scores are boosted
  C D -E ... docs can not match E, must match either C or D
             (docs will score better if they match both C and D)
  F -G   ... docs can not match G, which means that to match this query
             they must match F

now consider the outermost query as an abstraction:  X -Y +Z
   ... docs can not match Y
   ... docs matching X will get a score boost
   ... docs must match Z

Putting all of that information back together we see that B isn't really
mandatory in the "big picture", the only thing that's really mandatory is
F ... matching B is just required in order to get a score contribution
from A, etc...

Looking at the Explanation toString for any complex struction of
BooleanQueries should help with all of this ... you'll get moreinfo then
you want, but not more then you can use.

(and fortunately: Explanations work even if the document doesn't match,
and they explain why a document doesn't match)


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message