lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <>
Subject [jira] [Updated] (LUCENE-7621) Per-document minShouldMatch
Date Fri, 18 Aug 2017 15:49:00 GMT


Adrien Grand updated LUCENE-7621:
    Attachment: LUCENE-7621.patch

Here is a patch that adds such a query in the sandbox. It can't be as smart as {{MinShouldMatchSumScorer}}
due to the fact that it can't predict values of the number of required clauses, but it is
still smarter than a DisjunctionSumScorer whose two-phase iterator would check the {{freq()}}.

> Per-document minShouldMatch
> ---------------------------
>                 Key: LUCENE-7621
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7621.patch
> I have seen similar requirements a couple times but could not find any related issue
so I am opening one now. The idea would be to allow passing a {{LongValuesSource}} rather
than an integer as the {{minShouldMatch}} parameter of {{BooleanQuery}} so that the number
of required clauses can depend on the document that is being matched. In terms of implementation,
it looks like it would be straightforward as we would just have to update the value of {{minShouldMatch}}
in {{MinShouldMatchSumScorer.setDocAndFreq}} and things would still be efficient, ie. we would
still use advance on the costly clauses.
> This kind of feature would allow to run queries that must match eg. 80% of the terms
that a document contains (by indexing the number of terms in a separate field). It would also
make it possible for Luwak or ES' percolator to index boolean queries that have a value of
{{minShouldMatch}} greater than 1 more efficiently.
> I do not have any plans to work on it soon but I am curious how much interest this feature
would drive.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message