lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject lucene parser, negative OR operands
Date Tue, 17 May 2011 22:57:50 GMT
(changed subject for this topic). Weird. I'm seeing it wrong myself, and 
have for a while -- I even wrote some custom pre-processor logic at my 
app level to work around it.  Weird, I dunno.

Wait. "Queries with -one OR -two return less documents than a either 
operand does on its own."

Wait, that's exactly what's wrong, isn't it?  How can there be fewer 
documents that have "-one OR -two" then have "-one" alone?  If there are 
X documents that do not have a "one" in them, there can't be less than X 
documents that EITHER do not have a "one" OR do not have a "two" (ie, 
documents that do not have BOTH one and two), can there? We didn't ask 
for "-one AND -two", we asked for "-one OR -two".

On 5/17/2011 6:42 PM, Markus Jelsma wrote:
> mmmm, that's not what i see while testing right now. Queries with -one OR -two
> return less documents than a either operand does on its own, this is with
> LuceneQParser. I haven't done extensive testing since i rarely use boolean
> algebra in Lucene or Solr.
>
>> Oops, you're right, I had misremembered --- Solr 1.4.1 "lucene" qp
>> handles pure negative fine, it's Solr 1.4.1 _dismax_ that does not.
>>
>> Although, here's one, not actually related to this thread,  that DOESN'T
>> work in Solr 1.4.1 lucene query parser. Curious if it's been fixed in
>> Solr 3.1.
>>
>> &defType=lucene&q=-one OR -two
>>
>> That one does NOT work as expected in solr 1.4.1, although I can't
>> explain exactly what it's doing, it's not right. (It returns FEWER
>> results than "-one" alone, which can't be right algebraicly). I think.
>> So there are still some kinds of negative queries that do weird things.
>>
>> On 5/17/2011 6:29 PM, Markus Jelsma wrote:
>>> Such a negation works just as one would expect.
>>>
>>> q=*:*
>>> <result name="response" numFound="158" start="0">
>>>
>>> q=*:*&fq=-type:text/html
>>> <result name="response" numFound="25" start="0">
>>>
>>> q=*:*&fq=type:text/html
>>> <result name="response" numFound="133" start="0">
>>>
>>> Well, that adds up , doesn't it ;)
>>>
>>>> 1. I don't think Solr will re-use the filter cache in that situation,
>>>> although I'm not sure. But I comment anyway because, not what you asked
>>>> but something else that will trip you up with your example:
>>>>
>>>> 2. In fact, a pure-negative query like that doesn't work _at all_ in the
>>>> default solr/lucene query parser used for 'fq', at least in Solr 1.4.1.
>>>> Not sure if it's been improved in 3.1, but I don't think so.  It will
>>>> always return 0 hits, the solr/lucene query parser can't generate a
>>>> proper lucene query from a pure negative query like that.
>>>>
>>>> To get around this, you can find a variation the query that means the
>>>> same thing but isn't that form. Here's a really ugly one I use, with a
>>>> nested dismax -- dismax ALSO has trouble with pure negatives, although I
>>>> think maybe edismax can handle em? But this weird as heck combo works,
>>>> maybe there's a better way.
>>>>
>>>> NOT _query_:"{!dismax qf=something}history"
>>>>
>>>> And to come around full circle, I have NO idea what effect nested
>>>> queries have on the filter cache. I think that STILL won't re-use the
>>>> filter cache.... but I wonder if it'll re-use the _query_ cache for
>>>> "history"?  I forget even more how the query cache works though.
>>>>
>>>> On 5/17/2011 6:07 PM, Burton-West, Tom wrote:
>>>>> If I have a query with a filter query such as : " q=art&fq=history"
and
>>>>> then run a second query  "q=art&fq=-history", will Solr realize that
it
>>>>> can use the cached results of the previous filter query "history"  (in
>>>>> the filter cache) or will it not realize this and have to actually do
a
>>>>> second filter query against the index  for "not history"?
>>>>>
>>>>> Tom

Mime
View raw message