incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Zheng <bearzheng2...@gmail.com>
Subject Re: S4 Communication Layer using ZooKeeper
Date Wed, 03 Oct 2012 06:37:31 GMT
Hi Kishore,

Could you point out which configuration is related to the retry size queue,
e.g. the name?

Thanks.
Frank

On Wed, Oct 3, 2012 at 2:30 PM, kishore g <g.kishore@gmail.com> wrote:

> Yes, you can restart the nodes automatically that previously died. But
> keep in mind that events may be dropped for that partition until the node
> is restarted. I think we have some configuration on the retry size queue
> and you can configure this based on how long it would take to automatically
> restart the node.Make sure there is enough memory on all nodes to keep the
> events in queue until the node comes back up.
>
> Another thing to keep in mind is if multiple nodes fail and you restart
> them it is probably not guaranteed to pick up the partition that it had
> picked up earlier.
>
> On Tue, Oct 2, 2012 at 11:19 PM, Frank Zheng <bearzheng2011@gmail.com>wrote:
>
>> Hi Kishore,
>>
>> This describes very clearly. Thank you a lot!
>>
>> Now I have another question.
>> When one active node dies, the standby node tries to grab the lock.
>> What if no standby nodes are allowed? Under this assumption, is it
>> possible to restart the node automatically which dies previously?
>>
>> Thanks.
>> Frank
>>
>> On Wed, Oct 3, 2012 at 12:51 PM, kishore g <g.kishore@gmail.com> wrote:
>>
>>> At a very high level, this is how cluster management works
>>>
>>> Each s4 cluster has a name space reserved /clustername in zookeeper.
>>> There is an initial setup process where one or many znodes are created
>>> under /clustername/tasks. When nodes join the cluster they check if some
>>> one has already claimed a task by looking at /clustername/process/, if not
>>> it grabs the lock by creating an ephemeral node under
>>> /clustername/process/. If all tasks are taken it becomes a standby node.
>>> When any active node dies, the standby node gets notified and tries to grab
>>> the lock.
>>>
>>> We can provide more details, if you can let us know which aspect of
>>> cluster management mechanism you are interested in.
>>>
>>> Thanks,
>>> Kishore G
>>>
>>>  On Tue, Oct 2, 2012 at 9:17 PM, Frank Zheng <bearzheng2011@gmail.com>wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am exploring the cluster management mechanism and fault tolerance of
>>>> S4.
>>>> I saw that S4 used ZooKeeper in the communication layer. But it seems
>>>> not very clear in that pater, " S4: Distributed Stream Computing Platform".
>>>> I tried to search the reference "[15] Communication layer using
>>>> ZooKeeper, Yahoo! Inc. Tech. Rep., 2009", but it is not available.
>>>> Could anyone introduce me the role of ZooKeeper in S4, and the cluster
>>>> management mechanism in detail?
>>>>
>>>> Thanks.
>>>>
>>>> Sincerely,
>>>> Frank
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Sincerely,
>> Zheng Yu
>> Mobile:  (852) 60670059
>> Email:    bearzheng2011@gmail.com
>>
>>
>>
>>
>


-- 
Sincerely,
Zheng Yu
Mobile:  (852) 60670059
Email:    bearzheng2011@gmail.com

Mime
View raw message