lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cao Manh Dat (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-10285) Skip LEADER messages when there are leader only shards
Date Tue, 03 Oct 2017 04:29:02 GMT

    [ https://issues.apache.org/jira/browse/SOLR-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16187638#comment-16187638
] 

Cao Manh Dat edited comment on SOLR-10285 at 10/3/17 4:28 AM:
--------------------------------------------------------------

Hi [~jhump], your patch looks good to me. About your TODO notes, I did some search and found
that
- ElectionContext is the only place use OverseerAction.Leader ( one for unset leader and one
for set leader ).
- STATE_PROP used in the second case is replica's state, which even not used in {{SliceMutator.setShardLeader}}

So your concern about "mark the shard as inactive" is not correct, right?

The only problem that can occur between upgrade is 
1. A replica ( repA ) is currently leader
2. The overseer is very busy
3. repA does unset leader operation ( which is delayed because overseer is very busy )
4. repA get stopped in middle of the election process ( so set leader operation never get
executed )
5. repA start with the new code, then it saw it is the leader ( the unset operation in step
2 had not been executed ) so it skipped set leader operation.

I think that above case is very very very rare and even it happens ( it can be fixed with
FORCE_LEADER API ), Sysadmins must handle overwhelming in the number of operations in Overseer
first. 




was (Author: caomanhdat):
Hi [~jhump], your patch looks good to me. About your TODO notes, I did some search and found
that
- ElectionContext is the only place use OverseerAction.Leader ( one for unset leader and one
for set leader ).
- STATE_PROP used in the second case is replica's state, which even not used in {{SliceMutator.setShardLeader}}

So your concern about "mark the shard as inactive" is not correct, right?

The only problem that can occur between upgrade is 
1. A replica ( repA ) is currently leader
2. The overseer is very busy
3. repA does unset leader operation ( which is delayed because overseer is very busy )
4. repA get stopped in middle of the election process ( so set leader operation never get
executed )
5. repA start with the new code, then it saw it is the leader ( the unset operation in step
2 had not been executed ) so it skipped set leader operation.

I think that above case is very very very rare and even it happens, Sysadmins must handle
overwhelming in the number of operations in Overseer first. 



> Skip LEADER messages when there are leader only shards
> ------------------------------------------------------
>
>                 Key: SOLR-10285
>                 URL: https://issues.apache.org/jira/browse/SOLR-10285
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Varun Thacker
>            Assignee: Cao Manh Dat
>         Attachments: SOLR-10285.patch, SOLR-10285.patch, SOLR-10285.patch
>
>
> For shards which have 1 replica ( leader ) we know it doesn't need to recover from anyone.
We should short-circuit the recovery process in this case. 
> The motivation for this being that we will generate less state events and be able to
mark these replicas as active again without it needing to go into 'recovering' state. 
> We already short circuit when you set {{-Dsolrcloud.skip.autorecovery=true}} but that
sys prop was meant for tests only. Extending this to make sure the code short-circuits when
the core knows its the only replica in the shard is the motivation of the Jira.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message