lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7580) Spans tree scoring
Date Sun, 11 Dec 2016 21:47:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740406#comment-15740406
] 

Paul Elschot commented on LUCENE-7580:
--------------------------------------

Some scientific articles on this subject:

Metzler, Donald, and W. Bruce Croft.
"A Markov random field model for term dependencies."
Proceedings of the 28th annual international ACM SIGIR conference
on Research and development in information retrieval. ACM, 2005.

In section 2.3 they use terms and ordered and unordered phrases
The ranking function is a weighted linear combination for these.
The optimal weights are about 80/10/10 for simple terms, unordered, and ordered.
Here this led to the use of a weighting factor non matching occurrences.
They also found that the minimum distance is the best indicator of relevance.


Bendersky, Michael, and W. Bruce Croft.
"Modeling Higher-Order Term Dependencies in Information Retrieval using Query Hypergraphs"
SIGIR'12.

The concepts there can be nested, like span queries.
The approach there is much more general. For example:
- Table 2 shows the use of the frequency of a concept in various collections
to determine its weight.
- In section 2.4.2 there is an indication that the slop factor needs attention:
"... the existing term proximity measures usually capture close, sentence-level,
co-occurrences of the query terms ... The dependency range is much longer for
concept dependencies."


Blanco, Roi, and Paolo Boldi.
"Extending BM25 with multiple query operators."
Proceedings of the 35th international ACM SIGIR conference
on Research and development in information retrieval. ACM, 2012.

This scores regions with BM25F.


> Spans tree scoring
> ------------------
>
>                 Key: LUCENE-7580
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7580
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: master (7.0)
>            Reporter: Paul Elschot
>            Priority: Minor
>             Fix For: 6.x
>
>         Attachments: LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message