lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1360) A Similarity class which has unique length norms for numTerms <= 10
Date Thu, 27 Jan 2011 21:06:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987757#action_12987757
] 

Robert Muir commented on LUCENE-1360:
-------------------------------------

bq. Unfortunately, that value is packed in such a way that it gives the same value for 1-10
words in a field.

Lance, this is a bit misleading. only lengths {3,4} , {6,7}, and {8,9,10} share the same values.

For most uses, this isn't really that big of a deal that a few numbers quantize to the same
bytes.

If you care about this, use SmallFloat.floatToByte52/byteToFloat52 as I suggested. Then they
are all unique.

You can also do this on a per-field basis now, e.g. only for your small-fields... thats why
I recommended we close this issue as obselete.


> A Similarity class which has unique length norms for numTerms <= 10
> -------------------------------------------------------------------
>
>                 Key: LUCENE-1360
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1360
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Query/Scoring
>            Reporter: Sean Timm
>            Assignee: Otis Gospodnetic
>            Priority: Trivial
>         Attachments: LUCENE-1380 visualization.pdf, ShortFieldNormSimilarity.java
>
>
> A Similarity class which extends DefaultSimilarity and simply overrides lengthNorm. 
lengthNorm is implemented as a lookup for numTerms <= 10, else as {{1/sqrt(numTerms)}}.
This is to avoid term counts below 11 from having the same lengthNorm after stored as a single
byte in the index.
> This is useful if your search is only on short fields such as titles or product descriptions.
> See mailing list discussion: http://www.nabble.com/How-to-boost-the-score-higher-in-case-user-query-matches-entire-field-value-than-just-some-words-within-a-field-td19079221.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message