lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <>
Subject [jira] [Commented] (SOLR-6530) Commits under network partition can put any node in down state by any node
Date Fri, 19 Sep 2014 06:21:34 GMT


Shalin Shekhar Mangar commented on SOLR-6530:

[~romseygeek] - That sounds right. This fix will help to a great extent but it isn't perfect.
I think we may need to add some intelligence to the overseer to eliminate invalid state transitions
to the cluster state. Also SOLR-6538 can help in resolving such issues e.g. a leader being
set to down state isn't aware of it's state and will never try to get out of it.

[~andyetitmoves] - Yes, you're right that it could happen. We need to refactor this code such
that these rules are properly defined and enforced.

> Commits under network partition can put any node in down state by any node
> --------------------------------------------------------------------------
>                 Key: SOLR-6530
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Priority: Critical
>             Fix For: 5.0, 6.0
>         Attachments: SOLR-6530.patch, SOLR-6530.patch, SOLR-6530.patch
> Commits are executed by any node in SolrCloud i.e. they're not routed via the leader
like other updates. 
> # Suppose there's 1 collection, 1 shard, 2 replicas (A and B) and A is the leader
> # Suppose a commit request is made to node B during a time where B cannot talk to A due
to a partition for any reason (failing switch, heavy GC, whatever)
> # B fails to distribute the commit to A (times out) and asks A to recover
> # This was okay earlier because a leader just ignores recovery requests but with leader
initiated recovery code, B puts A in the "down" state and A can never get out of that state.
> tl;dr; During network partitions, if enough commit/optimize requests are sent to the
cluster, all the nodes in the cluster will eventually be marked as "down".

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message