lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <>
Subject Re: DisjunctionMaxQuery and scoring
Date Thu, 19 Apr 2012 21:15:59 GMT
On Thu, Apr 19, 2012 at 5:10 PM, Robert Muir <> wrote:
> On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies <> wrote:
>> On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir <> wrote:
>>> On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies <>
>>>> On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir <> wrote:
>>>>> On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies <>
>>>>>> I am trying to solve a problem using DisjunctionMaxQuery.
>>>>>> Consider a query like:
>>>>>> a:b OR c:d OR e:f OR ...
>>>>>> name:richard OR name:dick OR name:dickie OR name:rich ...
>>>>>> At most, one of the richard names matches. So the match score gets
>>>>>> dragged down by the long list of things that don't match, as the
>>>>>> can get quite long.
>>>>>> It seemed to me, upon reading the documentation, that I could cure
>>>>>> this problem by creating a query tree that used DisjunctionMaxQuery
>>>>>> around all those nicknames. However, when I built a boolean query
>>>>>> had, as a clause, a DisjunctionMaxQuery in the place of a pile of
>>>>>> these individual Term queries, the score and the explanation did
>>>>>> change at all -- in particular, the coord term shows the same number
>>>>>> of total terms. So it looks as if the children of the disjunction
>>>>>> still count.
>>>>>> Is there a way to control that term? Or a better way to express this?
>>>>>> Thinking SQL for a moment, what I'm trying to express is
>>>>>>   name IN (richard, dick, dickie, rich)
>>>>> I think you just want to disable coord() here? You can do this for
>>>>> that particular boolean query by passing true to the ctor:
>>>>>  public BooleanQuery(boolean disableCoord)
>>>> Rob,
>>>> How do nested queries work with respect to this? If I build a boolean
>>>> query one of whose clauses is a BooleanQuery with coord turned off,
>>>> does just the nested query insides get left out of 'coord'?
>>>> If so, then your answer certainly seems to be what the doctor ordered.
>>> it applies only to that query itself. So if this BQ is a clause to
>>> another BQ that has coord enabled,
>>> that would not change the top-level BQ's coord.
>>> Note: if you don't want coord at all, then you can also plug in a
>>> Similarity that returns 1,
>>> or pick another Similarity like BM25: in trunk only the vector space
>>> impl even does anything for coord()....
>> Robert, I'm sorry that my density is approaching lead. My problem is
>> that I want coord, but I want to control which terms are counted and
>> which are not. I suppose I can accomplish this with my own scorer. My
>> hope was that there was a way to express "This group of terms counts
>> as one for coord".
> So just structure your boolean query appropriately?
> BQ1(coord=true)
>  BQ2(coord=false): 25 terms
>  BQ3(coord=false): 87 terms
> BQ1's coord is based on how many subscorers match (out of 2, BQ2 and
> BQ3). If both match its 2/2 otherwise 1/2.
> But in this example BQ2 and BQ3 disable coord themselves, hiding the
> fact they accept 25 and 87 terms respectively and appearing as a
> single sub for coord().
> Does this make sense? you can extend this idea to control this however
> you want by structuring the BQ appropriately so your BQ's with
> "synonyms" have coord=0


This makes perfect sense, it is what I thought you meant to begin
with. I tried it and thought that it did not work. Or, perhaps, I am
misreading the 'explain' output. Or, more likely, I goofed altogether.
I'll go back and recheck my results and post some explain output if I
can't find my mistake.


> --
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message