helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-543) Single partition unnecessarily moved
Date Tue, 25 Oct 2016 16:05:58 GMT

    [ https://issues.apache.org/jira/browse/HELIX-543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15605704#comment-15605704

ASF GitHub Bot commented on HELIX-543:

GitHub user lei-xia opened a pull request:


    [HELIX-543] Avoid moving partitions unnecessarily when auto-rebalancing.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lei-xia/helix helix-0.6.x

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #56
commit 45ebe767533a9c014bf37c30e4a6a62652538b5a
Author: Lei Xia <lxia@linkedin.com>
Date:   2016-10-25T16:01:35Z

    [HELIX-543] Avoid moving partitions unnecessarily when auto-rebalancing.


> Single partition unnecessarily moved
> ------------------------------------
>                 Key: HELIX-543
>                 URL: https://issues.apache.org/jira/browse/HELIX-543
>             Project: Apache Helix
>          Issue Type: Bug
>          Components: helix-core
>    Affects Versions: 0.7.1, 0.6.4
>            Reporter: Tom Widmer
>            Assignee: kishore gopalakrishna
>            Priority: Minor
> (Copied from mailing list)
> I have some resources that I use with the OnlineOffine state but which only have a single
partition at the moment (essentially, Helix is just giving me a simple leader election to
decide who controls the resource - I don’t care which participant has it, as long as only
one does). However, with full auto rebalance, I find that the ‘first’ instance (alphabetically
I think) always gets the resource when it’s up. So if I take down the first node so the
partition transfers to the 2nd node, then bring back up the 1st node, the resource transfers
back unnecessarily.
> Note that this issue also affects multi-partition resources, it’s just a bit less noticeable
(it means that with 3 nodes and 4 partitions, say, the partitions are always allocated 2,
1, 1, so if you have node 1 down and hence 0, 2, 2, and then bring up node 1, it unnecessarily
moves 2 partitions to make 2, 1, 1 rather than the minimum move to achieve ‘balance’ which
would be to move 1 partition from instance 2 or 3 back to instance 1.
> I can see the code in question in AutoRebalanceStrategy.typedComputePartitionAssignment,
where the distRemainder is allocated to the first nodes alphabetically, so that the capacity
of all nodes is not equal.
> The proposed solution is to sort the nodes by the number of partitions they already have
assigned, which should mean that those nodes are assigned the higher capacity and the problem
goes away.

This message was sent by Atlassian JIRA

View raw message