lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: lucene parser, negative OR operands
Date Thu, 19 May 2011 01:07:21 GMT

: Thanks Yonik. I recall hearing about this before, but was vague on the
: details, thanks for supplying some and refreshing my memory.

matching in Lucene is addative ... queries must match *something*, a 
clause ofa boolean query can be the negation of a query, but that only 
defines how documents should be removed from the set matched by the other 
queries in that boolean.

To put it another way: imagine modeling the list of documents matching a 
query as a bitset.  you can set bits to true, and you can set bits to 
false, but the bitset starts out with all bits as false, so if all 
you do is set bits to false, your bitset will *end* will all bits as false 

: If I want to understand more about how the lucene query parser does it's
: thing, can anyone suggest the source files I should be looking at?

the QueryParser.jj is the grammer for parsing, but the crux is to 
understand that the BooleanQuery class supports three types of clauses: 
PROHIBITED, MANDATORY, and OPTIONAL.  The QueryParser implements those as 
"-", "+" and the default beahvior when neither +/- is present. The 
QueryParser also jumps through some hoops to support AND, OR, NOT but not 
all permutations of those are viable

: If I really do want actual boolean logic behavior, what are my options?  I
: guess one is trying to write my own query parser.

"boolean logic" generally is defined in some form relative "the universe" 
.. so a pure negative query like "-red" really means "all things IN THE 
UNIVERSE that are not 'red'" ... you can express that using "*:* -red"

What solr does (and how this thread started) is pointing out that for top 
level queries, (like "q=-red" or "fq=-red") solr adds the *:* to the 
boolean query for you.

: Hmm, for that particular query, what about using parens to force a sub-query?
: (-one) OR (-two)
: Ha, nope, that runs into a different problem (or is it the same problem?), and
: always returns 0 hits.  It looks like the lucene query parser can't handle a
: pure-negative sub-query like that seperate by OR?  Not sure why, can anyone
: explain that one?

the query parser can handle it, and it produces a valid query object, but 
that query object doesn't match anything. "-one" matches nothing, "-two" 
matchines nothing ... nothing union nothing is still nothing.

: For that particular pattern, this crazy refactoring of the query does work and
: get the actual boolean logic result of "(not 'one') OR (not 'two')":
: (*:* AND -one) OR (*:* AND -two)

correct -- that is you formally saying "give me all docs IN THE UNIVERSE 
that are not 'one', and union that with all docs IN THE UNIVERSE that are 
not 'two'"

: behavior for that pattern, but in general, I'm kind of wanting a parser that
: will give actual boolean logic behavior. Maybe someday I can find time to
: write it in Java (not the quickest thing for me, not familiar with the code at
: all).

You could implement a parser like that relatively easily -- just make sure 
you put a MatchAllDocsQuery in every BooleanQuery object thta you 
construct, and only ever use the PROHIBITED and MANDATORY clause types 
(never OPTIONAL) ...  the thing is, a parser like that isn't as useful 
as you think it might be when dealing with search results.  "OPTIONAL" 
clauses are where most of the useful factors of scoring documents ocme 
into play.


View raw message