lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-6819) Deprecate index-time boosts?
Date Tue, 28 Feb 2017 17:53:45 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-6819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated LUCENE-6819:
---------------------------------
    Attachment: LUCENE-6819-wip.patch

Here's a patch in case someone would like to run some relevancy tests. I goes even further
and uses a completely different encoding that stores lengths in a byte. It is fully accurate
up to 40 and then accuracy degrades linearly with the log of the length. It has a restriction
that it does not support index boosts, but on the other hand, making assumptions that index
boosts are not used allows it to make the 256 values useful, while with the current encoding,
if index boosts are not used, only 63 values represent valid lengths: other values are either
less than 1 or greater than MAX_VALUE.

The patch is just a proof of concept and does not try to tackle the removal of index-time
boosts or backward compatibility, which are the hard problems here.

> Deprecate index-time boosts?
> ----------------------------
>
>                 Key: LUCENE-6819
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6819
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6819-wip.patch
>
>
> Follow-up of this comment: https://issues.apache.org/jira/browse/LUCENE-6818?focusedCommentId=14934801&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14934801
> Index-time boosts are a very expert feature whose behaviour is tight to the Similarity
impl. Additionally users have often be confused by the poor precision due to the fact that
we encode values on a single byte. But now we have doc values that allow you to encode any
values the way you want with as much precision as you need so maybe we should deprecate index-time
boosts and recommend to encode index-time scoring factors into doc values fields instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message