lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Thacker (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9226) Automatically fire FORCELEADER if shard leader is missing
Date Mon, 20 Jun 2016 12:11:05 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339400#comment-15339400
] 

Varun Thacker commented on SOLR-9226:
-------------------------------------

bq. * FORCELEADER command is executed in the node that receives the command. It should be
moved to overseer to ensure that we don't run multiple such commands in parallel. 


Maybe we can commit it as part of SOLR-8554 .

> Automatically fire FORCELEADER if shard leader is missing
> ---------------------------------------------------------
>
>                 Key: SOLR-9226
>                 URL: https://issues.apache.org/jira/browse/SOLR-9226
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>
> We have seen the shards losing leader often. 
> {code}
> x:lamp_2016050713_shard2_replica1] o.a.s.c.ZkController Error getting leader from zk
> org.apache.solr.common.SolrException: Could not get leader props
>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1044)
>         at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1011)
>         at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:967)
>         at org.apache.solr.cloud.ZkController.register(ZkController.java:906)
>         at org.apache.solr.cloud.ZkController.register(ZkController.java:849)
>         at org.apache.solr.core.ZkContainer$2.run(ZkContainer.java:183)
> {code}
> There could be other instances as well
> I recommend the following to heal such clusters 
> * Whenever a node finds that the shard has no LEADER, it should fire the force FORCELEADER
command
> * FORCELEADER command is executed in the node that receives the command. It should be
moved to overseer to ensure that we don't run multiple such commands in parallel. 
> * The command should make the best effort to identify a leader and should assign a leader
if at least one node is live in the shard
> * When a shard has lost the leader, it is very likely that thousands of such requests
will be fired and they would clog the work queue. This command should ensure that duplicate
requests for FORCELEADER are consumed up from the work-queue 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message