lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8340) Allow to boost by recency
Date Mon, 03 Sep 2018 15:59:01 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16602299#comment-16602299
] 

Adrien Grand commented on LUCENE-8340:
--------------------------------------

So I went back to this patch and did some testing. I played with the wikimedium10m dataset
and the following query (note that I had to do a hack to also index "lastModNDV" with a LongPoint):
{code:java}
Query boostedQ = new BooleanQuery.Builder()
		.add(new TermQuery(new Term("body", "ref")), Occur.MUST)
		.add(LongPoint.newDistanceFeatureQuery("lastModNDV", 1f, 1335997132000L, 24 * 3600 * 1000),
Occur.SHOULD) // within 1 day
		.build();
{code}
The maximum score of the term query is 2.07. The maximum score of the distance query is 1,
and there are 582,764 documents whose timestamp is in [1335997132000L - 24 * 3600 * 1000,
1335997132000L + 24 * 3600 * 1000], meaning their score is in [0.5, 1].

When computing the top 10 matches and counting hits, all 3793973 hits must be visited and
points are never read. This takes about 99ms.
When computing the top 10 matches but not counting hits (totalHitsThreshold=1), only 264802
hits are collected (7% of matches) and the query runs in 29ms.

If I switch to more costly queries that have fewer hits then the speed up decreases, or even
becomes a slowdown unfortunately. That said I don't think it should prevent us from adding
something like that, which is a useful addition to the scoring toolbox.

> Allow to boost by recency
> -------------------------
>
>                 Key: LUCENE-8340
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8340
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-8340.patch
>
>
> I would like that we support something like \{{FeatureField.newSaturationQuery}} but
that works with features that are computed dynamically like recency or geo-distance, and is
still optimized for top-hits collection. I'm starting with recency because it makes things
a bit easier even though I suspect that geo-distance might be a more common need.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message