lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: BooleanQuery rewrite optimization
Date Sat, 13 Aug 2016 12:26:12 GMT
The explanation makes sense, I think you're right. Even though I don't
think this optimization would be used often, it would certainly help
performance when it is used.

Le sam. 13 août 2016 à 12:21, Spyros Kapnissis <skapni@yahoo.com.invalid> a
écrit :

> Ok, I had some time to look a bit further into it. It seems that indeed a
> SHOULD clause with FILTER is equivalent with an AND clause with a
> minShouldMatch of -1 in terms of both the number of results and score.
> My reasoning goes like this:
> given:- a single clause both as a SHOULD and a FILTER is equivalent to a
> single MUST clause.- minShouldMatch only applies to SHOULD queries- FILTER
> does not influence the score
> so:- during the transformation we remove one SHOULD clause by turning in
> into a MUST, so minShouldMatch is decremented by one. - this "one" was
> always covered before by the FILTER and now by the MUST clause, so the
> results remain exactly the same, all other things being equal.
> Not really a mathematical proof :) but I verified it through quite some
> testing and it seems to always be the case. Hope I am not missing
> something.. I can provide a patch.
> Btw would there be any real life performance differences between the two
> cases? eg. through filter caching?
>
>     On Wednesday, August 10, 2016 12:53 PM, Adrien Grand <
> jpountz@gmail.com> wrote:
>
>
>  I'm not awaken enough to figure out whether the -1 trick is right or not,
> but if you manage to prove it somehow, patches to simplify boolean queries
> at rewrite time are welcome!
>
> Le mar. 9 août 2016 à 00:47, Spyros Kapnissis <skapni@yahoo.com.invalid> a
> écrit :
>
> > Hm, I hadn't really thought about the minShouldMatch part, I thought it'
> d
> > be covered but I see your point being semantically different if you keep
> it
> > as is.
> > However.. Running your edge case example on an actual local index I get
> > the following:
> > "(X X Y #X)" w/minshouldmatch=2 vs. (+X X Y) w/minshouldmatch=2 => same
> > top score, less results in second case."(X X Y #X)" w/minshouldmatch=2
> vs.
> > (+X X Y) w/minshouldmatch=1 => same top score, same number of results"(X
> X
> > X Y #X)" w/minshouldmatch=3 vs. (+X X X Y) w/minshouldmatch=2 => same top
> > score, same number of results
> > But still not really convinced myself if decrementing minshouldmatch by 1
> > will do the trick.. I'll have to verify - maybe I'll try more examples to
> > see if it holds as a general case.. Nice exercise either way :)
> >
> >
> >
> >    On Tuesday, August 9, 2016 12:40 AM, Chris Hostetter <
> > hossman_lucene@fucit.org> wrote:
> >
> >
> >
> > Off the top of my head, i think any optimiation like that would also need
> > to account for minNrShouldMatch, wouldn't it?
> >
> > if your query is "(X Y Z #X)" w/minshouldmatch=2, and you rewrite that
> > query to "(+X Y Z)" w/minshouldmatch=2 you now have a semantically diff
> > query that won't match as many documents as the original.
> >
> > in that example, you could decrement minshouldmatch (=1) ... but i'm not
> > sure off that holds as a general rule for all possible
> permutations/values
> > ... i'd have to think about it.
> >
> > An interesting edge case to think about is "(X X Y #X)"
> w/minshouldmatch=2
> > ... pretty sure that would give you very diff scores if you rewrote it to
> > "(+X X Y)" (or "(+X Y)") w/minshouldmatch=1
> >
> >
> >
> > : Hello all, I noticed while debugging a query that BooleanQuery will
> > : rewrite itself to remove FILTER clauses that are also MUST as an
> > : optimization/simplification, which makes total sense. So (+f:x #f:x)
> > : will become (+f:x). However, shouldn't there also be another
> > : optimization to remove FILTER clauses that are also SHOULD, while
> > : converting them to MUST? So, for eg. query (f:x #f:x) will become
> > : (+f:x). I did an initial simple implementation and the tests seem to
> > : pass. Are there any cases where this does not hold?
> > :
> > :
> >
> > -Hoss
> > http://www.lucidworks.com/
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message