lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (SOLR-11662) Make overlapping query term scoring configurable per field type
Date Sun, 03 Dec 2017 19:57:00 GMT


ASF GitHub Bot commented on SOLR-11662:

Github user ctargett commented on a diff in the pull request:
    --- Diff: solr/solr-ref-guide/src/field-type-definitions-and-properties.adoc ---
    @@ -87,6 +87,13 @@ For multivalued fields, specifies a distance between multiple values,
which prev
     `autoGeneratePhraseQueries`:: For text fields. If `true`, Solr automatically generates
phrase queries for adjacent terms. If `false`, terms must be enclosed in double-quotes to
be treated as phrases.
    +Query used to combine scores of overlapping query terms (ie synonyms). Consider a search
for "blue tee" with query-time synonyms `tshirt,tee`.
    +Use `as_same_term` (default) to blend terms, ie `SynonymQuery(tshirt,tee)` where each
term will be treated as equally important. Use `pick_best` to select the most significant
synonym when scoring `Dismax(tee,tshirt)`. Use `as_distinct_terms` to bias scoring towards
the most significant synonym `(pants OR slacks)`.
    +`as_same_term` is appropriatte when terms are true synonyms (television, tv). `pick_best`
and `as_distinct_terms` are appropriatte when synonyms are expanding to hyponyms (q=jeans
w/ jeans=>jeans,pants) and you want exact to come before parent and sibling concepts. See
this[blog article].
    --- End diff --
    Is "appropriate" spelled wrong (with an extra 't')? It's done twice so I'm not sure if
I'm perhaps misunderstanding the context.

> Make overlapping query term scoring configurable per field type
> ---------------------------------------------------------------
>                 Key: SOLR-11662
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Doug Turnbull
>             Fix For: 7.2, master (8.0)
> This patch customizes the query-time behavior when query terms overlap positions. Right
now the only option is SynonymQuery. This is a fantastic default & improvement on past
versions. However, there are use cases where terms overlap positions but don't carry exact
synonymy relationships. Often synonyms are actually used to model hypernym/hyponym relationships
using synonyms (or other analyzers). So the individual term scores matter, with terms with
higher specificity (hyponym) scoring higher than terms with lower specificity (hypernym).
> This patch adds the fieldType setting scoreOverlaps, as in:
> {code:java}
>   <fieldType name="text_general"  scoreOverlaps="pick_best"  class="solr.TextField"
positionIncrementGap="100" multiValued="true">
> {code}
> Valid values for scoreOverlaps are:
> *as_one_term*
> Default, most synonym use cases. Uses SynonymQuery
> Treats all terms as if they're exactly equivalent, with document frequency from underlying
terms blended 
> *pick_best*
> For a given document, score using the best scoring synonym (ie dismax over generated
> Useful when synonyms not exactly equilevant. Instead they are used to model hypernym/hyponym
relationships. Such as expanding to synonyms of where terms scores will reflect that quality
> IE this query time expansion
> tabby => tabby, cat, animal
> Searching "text", generates the dismax (text:tabby | text:cat | text:animal)
> *as_distinct_terms*
> (The pre 6.0 behavior.)
> Compromise between pick_best and as_oneSterm
> Appropriate when synonyms reflect a hypernym/hyponym relationship, but lets scores stack,
so documents with more tabby, cat, or animal the better w/ a bias towards the term with highest
> Terms are turned into a boolean OR query, with documen frequencies not blended
> IE this query time expansion
> tabby => tabby, cat, animal
> Searching "text", generates the boolean query (text:tabby  text:cat text:animal)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message