helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-682) Stale message should not prevent controller from rebalancing resource
Date Wed, 21 Mar 2018 02:04:00 GMT

    [ https://issues.apache.org/jira/browse/HELIX-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407337#comment-16407337
] 

ASF GitHub Bot commented on HELIX-682:
--------------------------------------

GitHub user zhan849 opened a pull request:

    https://github.com/apache/helix/pull/156

    [HELIX-682] controller should delete obsolete messages with timeout to unblock state transition

    This RB contains implementations and tests for controller: during MessageGenerationPhase,
it checks if the pending message should be cleaned up on participant to unblock further state
transition:
    
    - If partition's current state is same as message's toState, and the 3sec timeout already
passed, in this case, it's likely that participant failed to delete message and controller
should proactively remove the message so further rebalance could be unblocked
    - If partition's current state is same as message's fromState, this means the partition
is undergoing state transition or the state transition has not started yet, in this case,
we do nothing
    - If partition's current state is neither message's fromState nor toState (almost impossible),
this means this message is a problematic one, and it is safe to delete it immediately so participant
would not undergo an unnecessary message handling
    
    Message deletion on controller side is async

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhan849/helix harry/controller-msg-dedup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/helix/pull/156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #156
    
----
commit 9f789dee0b17886bd97ebf4cc14e9d867043183d
Author: Harry Zhang <zhan849@...>
Date:   2018-03-21T01:47:02Z

    [HELIX-682] controller should delete obsolete messages with timeout to unblock state transition

----


> Stale message should not prevent controller from rebalancing resource
> ---------------------------------------------------------------------
>
>                 Key: HELIX-682
>                 URL: https://issues.apache.org/jira/browse/HELIX-682
>             Project: Apache Helix
>          Issue Type: Bug
>            Reporter: Hao Zhang
>            Priority: Major
>
> Currently during MessageGenerationPhase, we skip re-balancing when there is pending message.
Though we assume that participant will delete messages when they finish the task, there will
be cases that when ZK is not stable and participant fail to do so, which will leave message
un-deleted and thus block rebalance.
> Ideally on controller side, we should try to delete message as well: if partition's current
state is same as message's toState, or there is totally invalid message remaining, controller
should try to delete message to unblock rebalancing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message