lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-8586) Implement hash over all documents to check for shard synchronization
Date Thu, 11 Feb 2016 04:23:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142215#comment-15142215
] 

Yonik Seeley edited comment on SOLR-8586 at 2/11/16 4:22 AM:
-------------------------------------------------------------

bq. Yep, I've been looping a custom version of the HDFS-nothing-safe test that among other
things, only does adds, no deletes.

Update: when I reverted my custom changes to the chaos test (so that it also did deletes),
I got a high amount of shard-out-of-sync errors... seemingly even more than before, so I've
been trying to track those down.  What I saw were issues that did not look related to PeerSync...
I saw missing documents from a shard that replicated from the leader while buffering documents,
and I saw the missing documents come in and get buffered, pointing to transaction log buffering
or replay issues.

Then I realized that I had tested "adds only" before committing, and tested the normal test
after committing and doing a "git pull".  In-between those times was SOLR-8575, which was
a fix to the HDFS tlog!  I've been looping the test for a number of hours with those changes
reverted, and I haven't seen a shards-out-of-sync fail so far.  I've also done a quick review
of SOLR-8575, but didn't see anything obviously incorrect.  The changes in that issue may
just be uncovering another bug (due to timing) rather than causing one... too early to tell.

I've also been running the non-hdfs version of the test for over a day, and also had no inconsistent
shard failures.


was (Author: yseeley@gmail.com):
bq. Yep, I've been looping a custom version of the HDFS-nothing-safe test that among other
things, only does adds, no deletes.

Update: when I reverted my custom changes to the chaos test (so that it also did deletes),
I got a high amount of shard-out-of-sync errors... seemingly even more than before, so I've
been trying to track those down.  What I saw were issues that did not look related to PeerSync...
I saw missing documents from a shard that replicated from the leader while buffering documents,
and I saw the missing documents come in and get buffered, pointing to transaction log buffering
or replay issues.

Then I realized that I had tested "adds only" before committing, and tested the normal test
after committing and doing a "git pull".  In-between those times was SOLR-8575, which was
a fix to the HDFS tlog!  I've been looping the test for a number of hours with those changes
reverted, and I haven't seen a shards-out-of-sync fail so far.  I've also done a quick review
of SOLR-8575, but didn't see anything obviously incorrect.

I've also been running the non-hdfs version of the test for over a day, and also had no inconsistent
shard failures.

> Implement hash over all documents to check for shard synchronization
> --------------------------------------------------------------------
>
>                 Key: SOLR-8586
>                 URL: https://issues.apache.org/jira/browse/SOLR-8586
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>             Fix For: 5.5, master
>
>         Attachments: SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch, SOLR-8586.patch
>
>
> An order-independent hash across all of the versions in the index should suffice.  The
hash itself is pretty easy, but we need to figure out when/where to do this check (for example,
I think PeerSync is currently used in multiple contexts and this check would perhaps not be
appropriate for all PeerSync calls?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message