lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kchellappa <>
Subject Use BM25Similarity for title field and default for others
Date Fri, 05 Apr 2013 22:58:32 GMT
We want the effect of the field length to have a lesser influence on score
for the title field (we don't want to completely disable it) -- so we get
the following behavior

    Docs with more hits in the title rank higher
    Docs with shorter titles rank higher if the hits are equal.

The DefaultSimilarity wasn't giving us this always (shorter titles were
preferred over longer titles with more hits.

Note -- we use edismax and search across title and other fields (like body)

Inorder to solve this we use BM25Similarity with a small value for b for the
title field.  We ended up using the SchemaSimilarityFactory for the global
similarity inorder to use the BM25Simiarlity for the title field.   This
gave us the results we are looking for with respect to the title field.

We also have keyword, tag and other metadata fields and we want them to be
mostly treated as filters and not influence the score at all.   Because of
the use of the SchemaSimilarityFactory, even though we get the
DefaultSimilarity for non title fields, it is not the same as
DefaultSimilarityFactory and so we have situations where the metadata fields
dominate the score (because PerFieldSimilarityWrapper uses queryNorm of 1.0) 

We are thinking that we have the following options to fix this issue

    a)   Use BM25Similarity for all fields and adjust the k1, b values as
    b)   Send the metadata field clauses as part of  fq instead of q (but we
might have lot of dynamically generated clauses and not sure if fq is the
best suited for these as we don't want them to be cached as they could vary
from request to request)
   c)   Associate a boost of zero for the metadata fields in the query
   d)   Extend the SchemaSimilarityFactory and write custom code (at this
point, I am not sure what the custom class should do)

Are these correct?  Do we have any other options. Any advice on what is a
better option.
I appreciate any inputs on this.

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message