lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6491) Add preferredLeader as a ROLE and a collections API command to respect this role
Date Tue, 16 Sep 2014 17:37:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135786#comment-14135786
] 

Erick Erickson commented on SOLR-6491:
--------------------------------------

Right, there's an ecosystem here, several related bits to bring it all together. At least
here's my current vision, I'm completely open to suggestions...

1> assigning preferred leader roles.
1a> manually via the ADDREPLICAROLE (SOLR-6512)
1b> automatically via SOLR-6513. NOTE: this will just update the clusterstate with the
role assignments for a single collection, it won't trigger any leader re-election.

2> giving preference during leader election to those assigned roles. Not absolutely enforcing
that assignment, but putting the preferred leader at the head of the list whenever leader
election is happening for some other reason. Mechanism TBD. I don't think this has a lot of
impact on the system, but you never know. Probably needs a new JIRA, this one is turning into
an umbrella JIRA

3> when the system does get out of whack, a collections API to "try to make all the leaders
the preferred leader now". The mechanism here is very much TBD, the last thing we need to
do is flood the system with a zillion leader elections so it'll have to be throttled somehow.
(SOLR-6517)

It may be that <3> becomes an infrequent event. With <2> in place, the system
will tend towards the leadership topology that's set up, but I'm pretty sure there'll still
be occasions when it'll be required.

And consider the pathological situation we face now. Hypothetically _all_ the leaders can
be on a single node. In fact situations approaching this have been observed "in the field".
If _that_ node goes down, all leaders are elected at once.

Anyway, as I said this isn't cast in stone. It does seem that one approach would be a background
process (possibly in the Overseer code?) whose job is to try to keep things in balance by
issuing "re-elect leader" commands whenever the actual leader isn't the preferred leader and
the node that _is_ the preferred leader is live. 

Or maybe a way to delay execution of overseer tasks for N seconds, the idea here would be
that <3> above would find all the leaders that were improperly assigned and issue all
the "relectMeAsLeaderIfPossible" commands at once but with a time in the future to run, so
you'd get Overseer commands like "relectMeAsLeaderIfPossible in 5 seconds" for some node,
"relectMeAsLeaderIfPossible in 10 seconds" for the next, etc. to keep from flooding the system
with election requests.

But I'm into code that has lots of complications, any guidance quite welcome.

> Add preferredLeader as a ROLE and a collections API command to respect this role
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-6491
>                 URL: https://issues.apache.org/jira/browse/SOLR-6491
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.11, 5.0
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>
> Leaders can currently get out of balance due to the sequence of how nodes are brought
up in a cluster. For very good reasons shard leadership cannot be permanently assigned.
> However, it seems reasonable that a sys admin could optionally specify that a particular
node be the _preferred_ leader for a particular collection/shard. During leader election,
preference would be given to any node so marked when electing any leader.
> So the proposal here is to add another role for preferredLeader to the collections API,
something like
> ADDROLE?role=preferredLeader&collection=collection_name&shard=shardId
> Second, it would be good to have a new collections API call like ELECTPREFERREDLEADERS?collection=collection_name
> (I really hate that name so far, but you see the idea). That command would (asynchronously?)
make an attempt to transfer leadership for each shard in a collection to the leader labeled
as the preferred leader by the new ADDROLE role.
> I'm going to start working on this, any suggestions welcome!
> This will subsume several other JIRAs, I'll link them momentarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message