lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Out of sync deletions causing differing IDF
Date Thu, 04 Aug 2016 12:12:16 GMT
Hello - your similarity should rely on numDoc instead, it solves the problem. I believe it
is already fixed in trunk, but i am not sure.
Markus
 
-----Original message-----
> From:Upayavira <upayavira@odoko.co.uk>
> Sent: Thursday 4th August 2016 13:59
> To: solr-user@lucene.apache.org
> Subject: Out of sync deletions causing differing IDF
> 
> We have a system that has a reasonable number of changes going on on a
> daily basis (maybe 60m docs, and around 1m updates per day). Using Solr
> Cloud, the data is split into 10 shards and those shards are replicated.
> 
> What we are finding is that the number of deletions is causing differing
> maxDocs across the different replicas, and that is causing significantly
> different IDF values between replicas of the same shard, giving
> different scores and thus different orders depending upon which replica
> we hit.
> 
> I would have expected that, because the data is being indexed
> concurrently across replicas, that the pattern of delete/merge would be
> similar across replicas, but that doesn't seem to be the case in
> practice.
> 
> We could, of course, optimise the index to merge down to a single
> segment. This would clear all deletes out, but would leave us in a worse
> place for the future, as now most of our deletes would be concentrated
> into a single large segment.
> 
> Has anyone seen this sort of thing before, and does anyone have
> suggested strategies as to how to encourage IDF values into a similar
> range across replicas?
> 
> Upayavira
> 

Mime
View raw message