helix-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shi Lu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HELIX-26) Better support for handling network partition and process freeze
Date Tue, 16 Apr 2013 00:22:16 GMT

    [ https://issues.apache.org/jira/browse/HELIX-26?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632414#comment-13632414
] 

Shi Lu commented on HELIX-26:
-----------------------------

Push this workitem to 0.6.2, as this is an improvement, and no one is depending on this yet.
                
> Better support for handling network partition and process freeze
> ----------------------------------------------------------------
>
>                 Key: HELIX-26
>                 URL: https://issues.apache.org/jira/browse/HELIX-26
>             Project: Apache Helix
>          Issue Type: Improvement
>    Affects Versions: 0.6.0-incubating
>            Reporter: kishore gopalakrishna
>            Assignee: Swaroop Jagadish
>             Fix For: 0.6.2-incubating
>
>
> Handling network partition is tricky in distributed systems. Zookeeper allows us to solve
this upto some degree with the use of heart beat. But this is not sufficient in large scale
systems with many nodes. One of the problems is that once the client detects disconnect which
happens on the client side, the options are
> 1. Put your self in a pause state until you reconnect.
> 2. Continue what ever you are doing until notified of session expiry.
> Unfortunately 1 is too agressive and 2 is too passive. Since Helix comes with the centralized
controller, its possible to have a more middle ground solution where once the participant
receives a disconnect event, it can check with co-ordinator(s)/peers to check if it can continue
operating.
> The challenge here for the node to detect if it belongs to the same partition as of the
co-ordinator or not. So its goal is to reach the controller, if it cannot reach the controller
it has to disable/fence itself.
> As of now Helix simply provides the state if its disconnected from the cluster and user
can either chose 1) or 2).
> This JIRA aims to investigate better ways to enhance network partition detection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message