helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-400) 0.6.x still calls the old rebalancing algorithm for no reason
Date Wed, 02 Nov 2016 21:09:59 GMT

    [ https://issues.apache.org/jira/browse/HELIX-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15630532#comment-15630532

ASF GitHub Bot commented on HELIX-400:

GitHub user mkscrg opened a pull request:


    helix-core: AutoRebalancer should include only numbered states in `currentMapping`

    AutoRebalancer constructs a `currentMapping` (`Map<PartitionId, Map<ParticipantId,
State>>`) which it passes to `AutoRebalanceStrategy#computePartitionAssignment()`. `ARS`
uses the mapping to sort the live nodes by # of partitions they hold.
    In `helix-0.6.x`, `currentMapping` includes _all states_, including "null" states like
`DROPPED` or `OFFLINE`. This breaks `ARS`'s node sorting, causing it to incorrectly move partitions
when nodes restart after disconnecting.
    `helix-0.7.x` does not have this issue. It was introduced between `0.6.2-incubating` and
    > [[HELIX-400] Remove all references to the old full auto rebalancing code](https://github.com/apache/helix/commit/8d99778a30d10f529ee0757286efa84ea581b5bf)
    See also
    - the recent port of [HELIX-543] (#56) to `helix-0.6.x`, which intended to avoid unnecessary
partition movement. That port was ineffective due to this issue.
    - [mailing list](http://mail-archives.apache.org/mod_mbox/helix-user/201610.mbox/%3CCAC56g41ejjcSi1P-Ohp3esyGqemBgFoji2Gy8tZQnJMo156OpA%40mail.gmail.com%3E)
thread for more background
    ### Example
    Consider this scenario:
    OnlineOffline state model
    2 nodes "NODE_0" and "NODE_1"
    1 resource "P" w/ 1 replica, 1 partition
    > currentMapping: `{P: {NODE_0: ONLINE}}`
    stop NODE_0
    > currentMapping: `{P: {NODE_1: ONLINE}}`
    start NODE_0
    > currentMapping: `{P: {NODE_0: OFFLINE, NODE_1: ONLINE}}`
    `ARS#computePartitionAssignment()` sorts the live nodes by the # of partitions they hold,
based on `currentMapping`, then reassigns partitions based on that sort. (The sort breaks
ties by comparing the node names.) So after restarting `NODE_0`, the sort is `[NODE_0, NODE_1]`,
and the `ONLINE` partition is incorrectly moved back to `NODE_0`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mkscrg/helix rebalance-numbered-states-only

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #58
commit 131e67bd7d98ae18eb4bbe0356cdd3a088f12c18
Author: Mike Craig <mcraig@box.com>
Date:   2016-11-02T20:22:11Z

    helix-core: AutoRebalancer should include only numbered states in `currentMapping`


> 0.6.x still calls the old rebalancing algorithm for no reason
> -------------------------------------------------------------
>                 Key: HELIX-400
>                 URL: https://issues.apache.org/jira/browse/HELIX-400
>             Project: Apache Helix
>          Issue Type: Sub-task
>            Reporter: Kanak Biscuitwala
>            Assignee: Kanak Biscuitwala
>             Fix For: 0.6.3
> After calling the new algorithm, the old algorithm is called. Typically this is a no-op,
except in the case of disabled partitions, where it might do the wrong thing. In any case,
this shouldn't exist.

This message was sent by Atlassian JIRA

View raw message