lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: SolrCloud different score for same document on different replicas.
Date Thu, 05 Jan 2017 13:52:58 GMT
Hello - you need a custom similarity and use docCount as divisor instead of maxDoc when calculating
IDF. I believe this was fixed in some version but i'm not sure.

Markus
 
-----Original message-----
> From:Morten B√łgeskov <mb@dbc.dk>
> Sent: Thursday 5th January 2017 14:33
> To: solr-user@lucene.apache.org
> Subject: SolrCloud different score for same document on different replicas.
> 
> 
> 
> Hi.
> 
> We've got a SolrCloud which is sharded and has a replication factor of
> 2.
> 
> The 2 replicas of a shard may look like this:
> 
> Num Docs:    5401023
> Max Doc:    6388614
> Deleted Docs:    987591
> 
> 
> Num Docs:    5401023
> Max Doc:    5948122
> Deleted Docs:    547099
> 
> We've seen >10% difference in Max Doc at times with same Num Docs.
> Our use case is few documents that are search and many small that
> are filtered against (often updated multiple times a day), so the
> difference in deleted docs aren't surprising.
> 
> This results in a different score for a document depending on which
> replica it comes from. As I see it: it has to do with the different
> maxDoc value when calculating idf.
> 
> This in turn alters a specific document's position in the search
> result over reloads. This is quite confusing (duplicates in pagination).
> 
> What is the trick to get homogeneous score from different replicas.
> We've tried using ExactStatsCache & ExactSharedStatsCache, but that
> didn't seem to make any difference.
> 
> Any hints to this will be greatly appreciated.
> 
> -- 
>  Morten B√łgeskov <mb@dbc.dk>
> 
> 

Mime
View raw message