lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Ferenczi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8630) Allow boosting of particular interval sources
Date Tue, 08 Jan 2019 00:34:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736523#comment-16736523
] 

Jim Ferenczi commented on LUCENE-8630:
--------------------------------------

Setting a boost value on a leaf seems difficult since it will also depend on the length of
each top-level interval. Moreover we don't have any evidence that the current scoring for
intervals makes sense so I am reluctant to add another factor in the formula. I also think
that we need to make the scoring more intuitive for intervals in general. The way we mix field
statistics and proximity in the current scoring is misleading IMO, it implies that it's a
good idea to mix interval query scores with boolean query scores even though scores are not
comparable (we sum the IDF in the intervals). 
Maybe we should compute a score that only takes the interval lengths (1 / (1 + len)) into
account and not the field statistics ? I don't think it's realistic to use an interval query
to compute a score that mixes field statistics and proximity. We should try to decorelate
these signals and add way to mix them correctly like the feature query does. It should be
natural for instance to use a simple boolean query to select a subset of documents in a first
pass and then use a rescorer with an interval query to re-rank based on proximity.


> Allow boosting of particular interval sources
> ---------------------------------------------
>
>                 Key: LUCENE-8630
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8630
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8630.patch
>
>
> In positional queries, it's common to want to promote some terms over others; for example,
in lists of synonyms you may want the original term to be weighted more, or more specific
terms to receive higher weights than less specific ones.
> Span queries have the 'SpanBoostQuery', which is currently broken; and a 'PayloadScoreQuery'
which allows direct modification of the score based on stored payloads, but which does not
deal well with a mix of terms with-and-without payloads, and which ends up exposing a lot
of the terms API, making it very difficult to customize.
> For interval queries, I'd like to try a different approach, adding a float-valued 'boost()'
method to IntervalIterator.  This would make it easy to add simple boosts around particular
terms in terms lists, and also allow more fine-grained control using payloads without having
to expose the mechanics of the PostingsEnum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message