lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Woodward (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8630) Allow boosting of particular interval sources
Date Tue, 08 Jan 2019 09:49:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16736936#comment-16736936
] 

Alan Woodward commented on LUCENE-8630:
---------------------------------------

Interval scoring currently uses the same implementation as SpanScorer, and MultiPhraseScorer,
but I agree that it would be good to separate things out a bit.  This paper suggests combining
the BM25 score and a proximity score by summing them, and has some ideas for calculating proximities:
http://www.bigdatalab.ac.cn/~gjf/papers/2012/Exploring%20and%20Exploiting%20Proximity%20Statistic%20for%20Information%20Retrieval%20Model.pdf


> Allow boosting of particular interval sources
> ---------------------------------------------
>
>                 Key: LUCENE-8630
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8630
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8630.patch
>
>
> In positional queries, it's common to want to promote some terms over others; for example,
in lists of synonyms you may want the original term to be weighted more, or more specific
terms to receive higher weights than less specific ones.
> Span queries have the 'SpanBoostQuery', which is currently broken; and a 'PayloadScoreQuery'
which allows direct modification of the score based on stored payloads, but which does not
deal well with a mix of terms with-and-without payloads, and which ends up exposing a lot
of the terms API, making it very difficult to customize.
> For interval queries, I'd like to try a different approach, adding a float-valued 'boost()'
method to IntervalIterator.  This would make it easy to add simple boosts around particular
terms in terms lists, and also allow more fine-grained control using payloads without having
to expose the mechanics of the PostingsEnum



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message