incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Zheng <bearzheng2...@gmail.com>
Subject Re: S4 Communication Layer using ZooKeeper
Date Wed, 03 Oct 2012 07:59:42 GMT
Hi Matthieu,

If there is no "retry queue", is it possible to restart the node
automatically which dies previously?

Thanks
Frank

On Wed, Oct 3, 2012 at 3:45 PM, Matthieu Morel <mmorel@apache.org> wrote:

> Hi,
>
> Actually currently there is no "retry queue" (we had a prototype mechanism
> but removed it due to some implementation issues). Therefore if there is a
> node failure, you might lose some messages until a failover node is
> reassigned to the corresponding partition, and this assignment is notified
> to sender nodes.
>
> However, even though you might lose some events during failover, you can
> recover previous state through the checkpointing mechanism.
>
> Regards,
>
>
> Matthieu
>
>
>
> On 10/3/12 8:37 AM, Frank Zheng wrote:
>
>> Hi Kishore,
>>
>> Could you point out which configuration is related to the retry size
>> queue, e.g. the name?
>>
>> Thanks.
>> Frank
>>
>> On Wed, Oct 3, 2012 at 2:30 PM, kishore g <g.kishore@gmail.com
>> <mailto:g.kishore@gmail.com>> wrote:
>>
>>     Yes, you can restart the nodes automatically that previously died.
>>     But keep in mind that events may be dropped for that partition until
>>     the node is restarted. I think we have some configuration on the
>>     retry size queue and you can configure this based on how long it
>>     would take to automatically restart the node.Make sure there is
>>     enough memory on all nodes to keep the events in queue until the
>>     node comes back up.
>>
>>     Another thing to keep in mind is if multiple nodes fail and you
>>     restart them it is probably not guaranteed to pick up the partition
>>     that it had picked up earlier.
>>
>>     On Tue, Oct 2, 2012 at 11:19 PM, Frank Zheng
>>     <bearzheng2011@gmail.com <mailto:bearzheng2011@gmail.**com<bearzheng2011@gmail.com>>>
>> wrote:
>>
>>         Hi Kishore,
>>
>>         This describes very clearly. Thank you a lot!
>>
>>         Now I have another question.
>>         When one active node dies, the standby node tries to grab the
>> lock.
>>         What if no standby nodes are allowed? Under this assumption, is
>>         it possible to restart the node automatically which dies
>> previously?
>>
>>         Thanks.
>>         Frank
>>
>>         On Wed, Oct 3, 2012 at 12:51 PM, kishore g <g.kishore@gmail.com
>>         <mailto:g.kishore@gmail.com>> wrote:
>>
>>             At a very high level, this is how cluster management works
>>
>>             Each s4 cluster has a name space reserved /clustername in
>>             zookeeper. There is an initial setup process where one or
>>             many znodes are created under /clustername/tasks. When nodes
>>             join the cluster they check if some one has already claimed
>>             a task by looking at /clustername/process/, if not it grabs
>>             the lock by creating an ephemeral node under
>>             /clustername/process/. If all tasks are taken it becomes a
>>             standby node. When any active node dies, the standby node
>>             gets notified and tries to grab the lock.
>>
>>             We can provide more details, if you can let us know which
>>             aspect of cluster management mechanism you are interested in.
>>
>>             Thanks,
>>             Kishore G
>>
>>             On Tue, Oct 2, 2012 at 9:17 PM, Frank Zheng
>>             <bearzheng2011@gmail.com <mailto:bearzheng2011@gmail.**com<bearzheng2011@gmail.com>
>> >>
>>
>>             wrote:
>>
>>                 Hi All,
>>
>>                 I am exploring the cluster management mechanism and
>>                 fault tolerance of S4.
>>                 I saw that S4 used ZooKeeper in the communication layer.
>>                 But it seems not very clear in that pater, " S4:
>>                 Distributed Stream Computing Platform".
>>                 I tried to search the reference "[15] Communication
>>                 layer using ZooKeeper, Yahoo! Inc. Tech. Rep., 2009",
>>                 but it is not available.
>>                 Could anyone introduce me the role of ZooKeeper in S4,
>>                 and the cluster management mechanism in detail?
>>
>>                 Thanks.
>>
>>                 Sincerely,
>>                 Frank
>>
>>
>>
>>
>>
>>
>>
>>         --
>>         Sincerely,
>>         Zheng Yu
>>         Mobile:  (852) 60670059
>>         Email: bearzheng2011@gmail.com <mailto:bearzheng2011@gmail.**com<bearzheng2011@gmail.com>
>> >
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> Sincerely,
>> Zheng Yu
>> Mobile:  (852) 60670059
>> Email: bearzheng2011@gmail.com <mailto:bearzheng2011@gmail.**com<bearzheng2011@gmail.com>
>> >
>>
>>
>>
>>
>


-- 
Sincerely,
Zheng Yu
Mobile:  (852) 60670059
Email:    bearzheng2011@gmail.com

Mime
View raw message