incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kishore g <>
Subject Re: S4 Communication Layer using ZooKeeper
Date Wed, 03 Oct 2012 06:30:22 GMT
Yes, you can restart the nodes automatically that previously died. But keep
in mind that events may be dropped for that partition until the node is
restarted. I think we have some configuration on the retry size queue and
you can configure this based on how long it would take to automatically
restart the node.Make sure there is enough memory on all nodes to keep the
events in queue until the node comes back up.

Another thing to keep in mind is if multiple nodes fail and you restart
them it is probably not guaranteed to pick up the partition that it had
picked up earlier.

On Tue, Oct 2, 2012 at 11:19 PM, Frank Zheng <>wrote:

> Hi Kishore,
> This describes very clearly. Thank you a lot!
> Now I have another question.
> When one active node dies, the standby node tries to grab the lock.
> What if no standby nodes are allowed? Under this assumption, is it
> possible to restart the node automatically which dies previously?
> Thanks.
> Frank
> On Wed, Oct 3, 2012 at 12:51 PM, kishore g <> wrote:
>> At a very high level, this is how cluster management works
>> Each s4 cluster has a name space reserved /clustername in zookeeper.
>> There is an initial setup process where one or many znodes are created
>> under /clustername/tasks. When nodes join the cluster they check if some
>> one has already claimed a task by looking at /clustername/process/, if not
>> it grabs the lock by creating an ephemeral node under
>> /clustername/process/. If all tasks are taken it becomes a standby node.
>> When any active node dies, the standby node gets notified and tries to grab
>> the lock.
>> We can provide more details, if you can let us know which aspect of
>> cluster management mechanism you are interested in.
>> Thanks,
>> Kishore G
>>  On Tue, Oct 2, 2012 at 9:17 PM, Frank Zheng <>wrote:
>>> Hi All,
>>> I am exploring the cluster management mechanism and fault tolerance of
>>> S4.
>>> I saw that S4 used ZooKeeper in the communication layer. But it seems
>>> not very clear in that pater, " S4: Distributed Stream Computing Platform".
>>> I tried to search the reference "[15] Communication layer using
>>> ZooKeeper, Yahoo! Inc. Tech. Rep., 2009", but it is not available.
>>> Could anyone introduce me the role of ZooKeeper in S4, and the cluster
>>> management mechanism in detail?
>>> Thanks.
>>> Sincerely,
>>> Frank
> --
> Sincerely,
> Zheng Yu
> Mobile:  (852) 60670059
> Email:

View raw message