lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: Question on query syntax
Date Fri, 13 Jul 2007 01:22:37 GMT
On 12-Jul-07, at 5:58 PM, Lance Lance wrote:

> Are there any known bugs in the syntax parser? We're using  
> lucene-2.2.0 and
> Solr 1.2.
>
> We have documents with searchable text and a field 'collection'.
>
> This query works as expected, finding everything except for  
> collections
> 'pile1' and 'pile2'.
>
>     text -(collection:pile1 OR collection:pile2)
>
> When we apply De Morgan's Law, we get 0 records:
>
>     text (-collection:pile1 AND -collection:pile2)
>
> This should return all records, but it returns nothing:
>
>     text (-collection:pile1 OR -collection:pile2)

Lucene's "boolean" operators are not true boolean operators.   
Instead, every clause is one of:

OPTIONAL
REQUIRED
PROHIBITED

for a query (or parenthesized subqueries) to match, all REQUIRED  
clauses must match, zero PROHIBITED clauses must match, and if there  
are not REQUIRED clauses, at least one OPTIONAL must match.  You  
cannot have only PROHIBITED clauses.

Now, the syntax for each is (nothing), +, -, and they can be applied  
to entire subqueries using brackets:

+hello -(goodbye -night)

returns docs that have hello, and do not have (goodbye without night)

In lucene, AND/OR/NOT are syntactic sugar that translates clauses to  
the above form.  However, it imperfectly matches people's (rational)  
expectations of how boolean operators work.  Also, brackets _create  
subqueries_, not just group operators.  I suggest that AND and OR  
never be used programmatically, if possible.

Try these alternatives:

docs (must) containing 'text' that do not match (col=pile1 or col=pile2)
>     text -(collection:pile1 collection:pile2)

same as above
>     text -collection:pile1 -collection:pile2

docs (must) contain 'text' that (must) match (col=pile1 or col=pile2)
>     +text +(collection:pile1 collection:pile2)

Note in the last example, the + is necessary before the text because  
otherwise it would be optional and not required (as there are other  
required clauses).

-Mike





Mime
View raw message