lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr relevancy score different on replicated nodes
Date Sat, 05 Jan 2019 00:35:25 GMT
Ashish:

Deleting and re-adding a replica is not a solution. Even if you did,
that would then be identical only until you started indexing again,
then the stats could skew a bit.

When you index to NRT replicas, the wall clock times that cause the
commits to trigger will be different due to network delays. What
happens essentially is that the doc gets indexed to the leader at time
X but hits the replica Y milliseconds later. So on leader, the
autocommit interval expires at time X+Z (Z being your autocommit
interval) but X+Y+Z on the follower. However, some additional docs may
have already been indexed on the leader but not yet on the follower
when the autocommit trigger happens so the newly-closed segment on the
leader can have docs that the newly-closed segment on the  follower
does not have.

the point is that the termfreq does _not_ change when a document is
deleted in some segment (and remember that an update is really a
delete followed by an add). The data associated with deleted docs is
not purged until segments are merged. Further, the decision about
which segments to merge is influenced by how many documents are
deleted in each.

All of which means that the tf/idf statistics are different (slightly)
and you either have to use destributed IDF or just live with it.

You're saying that the document count of live documents is different,
and that's more concerning. Is this true for brief intervals or is it
true when there is _no_ indexing going on _and_ your autocommit
interval is allowed to expire? In that case it's a different problem.
However, if the condition is transitory and goes away if you stop
indexing, then it's the same issue I outlined above; autocommit is
happening at different wall-clock times.

Best,
Erick

On Fri, Jan 4, 2019 at 11:12 AM Ashish Bisht <bishtashish77@gmail.com> wrote:
>
> Hi Erick,
>
> I have updated that I am not facing this problem in a new collection.
>
> As per 3) I can try deleting a replica and adding it again, but the
> confusion is which one out of two should I delete.(wondering which replica
> is giving correct score for query)
>
> Both replicas give same number of docs while doing all query.Its strange
> that in query explain docCount and docFreq is differing.
>
> Regards
> Ashish
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Mime
View raw message