incubator-s4-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: S4 Communication Layer using ZooKeeper
Date Tue, 09 Oct 2012 08:56:06 GMT
On 10/9/12 5:14 AM, Frank Zheng wrote:
> Hi,
>
> Could anyone explain in detail "The topology is written to ZooKeeper,
> and each running node watches the ZooKeeper node(s) to keep track of any
> changes" ?

The idea is that deployment metadata is written to ZooKeeper, in 
particular the available tasks (partitions). Each S4 node tries acquire 
exclusive access to a given partition using ZooKeeper's semantics and 
guarantees. When that happens, it means that a znode was created (in a 
specific directory) with the S4 node information, and a ZooKeeper 
notification is propagated (through watches) to all live S4 nodes.
You can read ZooKeeper's documentation for more information about the 
semantics and guarantees it provides.

>
> Without any standby nodes, could ZooKeeper start new node first when one
> working active node fails?

ZooKeeper is "just" a coordination system, it cannot "start" a node. 
However you could add a process that watches the topology and implements 
the logic to start a new S4 node when it detects such a need (through 
ZooKeeper's notifications).

The easiest solution is to overprovision though, and rely on standby nodes.


Matthieu


>
> Thanks.
>
> On Wed, Oct 3, 2012 at 9:27 PM, Matthieu Morel <mmorel@apache.org
> <mailto:mmorel@apache.org>> wrote:
>
>     We rely on notifications from Zookeeper to detect nodes failures.
>     Typically you would use extra standby nodes, that will pick the
>     unassigned partitions upon failures.
>
>     If you want to restart the same node, you could also do that
>     automatically (with something like daemon tools for instance) but as
>     Kishore mentioned, there is no guarantee that it will pick the same
>     partition than before
>
>     Anyway if a node is considered as failed, it might also be due to a
>     hardware failure or network partition, so it might not make sense to
>     restart the same S4 node on the same machine.
>
>     More details about fault tolerance can be found here:
>     https://cwiki.apache.org/__confluence/display/S4/Fault+__tolerance+in+S4
>     <https://cwiki.apache.org/confluence/display/S4/Fault+tolerance+in+S4>
>
>
>     Hope this helps,
>
>     Matthieu
>
>
>     On 10/3/12 9:59 AM, Frank Zheng wrote:
>
>         Hi Matthieu,
>
>         If there is no "retry queue", is it possible to restart the node
>         automatically which dies previously?
>
>         Thanks
>         Frank
>
>         On Wed, Oct 3, 2012 at 3:45 PM, Matthieu Morel
>         <mmorel@apache.org <mailto:mmorel@apache.org>
>         <mailto:mmorel@apache.org <mailto:mmorel@apache.org>>> wrote:
>
>              Hi,
>
>              Actually currently there is no "retry queue" (we had a
>         prototype
>              mechanism but removed it due to some implementation issues).
>              Therefore if there is a node failure, you might lose some
>         messages
>              until a failover node is reassigned to the corresponding
>         partition,
>              and this assignment is notified to sender nodes.
>
>              However, even though you might lose some events during
>         failover, you
>              can recover previous state through the checkpointing mechanism.
>
>              Regards,
>
>
>              Matthieu
>
>
>
>              On 10/3/12 8:37 AM, Frank Zheng wrote:
>
>                  Hi Kishore,
>
>                  Could you point out which configuration is related to
>         the retry size
>                  queue, e.g. the name?
>
>                  Thanks.
>                  Frank
>
>                  On Wed, Oct 3, 2012 at 2:30 PM, kishore g
>         <g.kishore@gmail.com <mailto:g.kishore@gmail.com>
>                  <mailto:g.kishore@gmail.com <mailto:g.kishore@gmail.com>>
>                  <mailto:g.kishore@gmail.com
>         <mailto:g.kishore@gmail.com> <mailto:g.kishore@gmail.com
>         <mailto:g.kishore@gmail.com>>>> wrote:
>
>                       Yes, you can restart the nodes automatically that
>                  previously died.
>                       But keep in mind that events may be dropped for that
>                  partition until
>                       the node is restarted. I think we have some
>         configuration
>                  on the
>                       retry size queue and you can configure this based
>         on how
>                  long it
>                       would take to automatically restart the node.Make
>         sure there is
>                       enough memory on all nodes to keep the events in queue
>                  until the
>                       node comes back up.
>
>                       Another thing to keep in mind is if multiple nodes
>         fail and you
>                       restart them it is probably not guaranteed to pick
>         up the
>                  partition
>                       that it had picked up earlier.
>
>                       On Tue, Oct 2, 2012 at 11:19 PM, Frank Zheng
>                       <bearzheng2011@gmail.com
>         <mailto:bearzheng2011@gmail.com>
>         <mailto:bearzheng2011@gmail.__com <mailto:bearzheng2011@gmail.com>>
>                  <mailto:bearzheng2011@gmail.
>         <mailto:bearzheng2011@gmail.>____com
>
>                  <mailto:bearzheng2011@gmail.__com
>         <mailto:bearzheng2011@gmail.com>>>> wrote:
>
>                           Hi Kishore,
>
>                           This describes very clearly. Thank you a lot!
>
>                           Now I have another question.
>                           When one active node dies, the standby node
>         tries to
>                  grab the lock.
>                           What if no standby nodes are allowed? Under this
>                  assumption, is
>                           it possible to restart the node automatically
>         which
>                  dies previously?
>
>                           Thanks.
>                           Frank
>
>                           On Wed, Oct 3, 2012 at 12:51 PM, kishore g
>                  <g.kishore@gmail.com <mailto:g.kishore@gmail.com>
>         <mailto:g.kishore@gmail.com <mailto:g.kishore@gmail.com>>
>                           <mailto:g.kishore@gmail.com
>         <mailto:g.kishore@gmail.com>
>
>                  <mailto:g.kishore@gmail.com
>         <mailto:g.kishore@gmail.com>>>> wrote:
>
>                               At a very high level, this is how cluster
>                  management works
>
>                               Each s4 cluster has a name space reserved
>                  /clustername in
>                               zookeeper. There is an initial setup
>         process where
>                  one or
>                               many znodes are created under
>         /clustername/tasks.
>                  When nodes
>                               join the cluster they check if some one
>         has already
>                  claimed
>                               a task by looking at
>         /clustername/process/, if not
>                  it grabs
>                               the lock by creating an ephemeral node under
>                               /clustername/process/. If all tasks are
>         taken it
>                  becomes a
>                               standby node. When any active node dies, the
>                  standby node
>                               gets notified and tries to grab the lock.
>
>                               We can provide more details, if you can
>         let us know
>                  which
>                               aspect of cluster management mechanism you are
>                  interested in.
>
>                               Thanks,
>                               Kishore G
>
>                               On Tue, Oct 2, 2012 at 9:17 PM, Frank Zheng
>                               <bearzheng2011@gmail.com
>         <mailto:bearzheng2011@gmail.com>
>                  <mailto:bearzheng2011@gmail.__com
>         <mailto:bearzheng2011@gmail.com>>
>                  <mailto:bearzheng2011@gmail.
>         <mailto:bearzheng2011@gmail.>____com
>         <mailto:bearzheng2011@gmail.__com
>         <mailto:bearzheng2011@gmail.com>>>>
>
>
>                               wrote:
>
>                                   Hi All,
>
>                                   I am exploring the cluster management
>         mechanism and
>                                   fault tolerance of S4.
>                                   I saw that S4 used ZooKeeper in the
>                  communication layer.
>                                   But it seems not very clear in that
>         pater, " S4:
>                                   Distributed Stream Computing Platform".
>                                   I tried to search the reference "[15]
>         Communication
>                                   layer using ZooKeeper, Yahoo! Inc.
>         Tech. Rep.,
>                  2009",
>                                   but it is not available.
>                                   Could anyone introduce me the role of
>         ZooKeeper
>                  in S4,
>                                   and the cluster management mechanism
>         in detail?
>
>                                   Thanks.
>
>                                   Sincerely,
>                                   Frank
>
>
>
>
>
>
>
>                           --
>                           Sincerely,
>                           Zheng Yu
>                           Mobile:  (852) 60670059
>                           Email: bearzheng2011@gmail.com
>         <mailto:bearzheng2011@gmail.com>
>                  <mailto:bearzheng2011@gmail.__com
>         <mailto:bearzheng2011@gmail.com>>
>                  <mailto:bearzheng2011@gmail.
>         <mailto:bearzheng2011@gmail.>____com
>         <mailto:bearzheng2011@gmail.__com <mailto:bearzheng2011@gmail.com>>>
>
>
>
>
>
>
>
>
>
>                  --
>                  Sincerely,
>                  Zheng Yu
>                  Mobile:  (852) 60670059
>                  Email: bearzheng2011@gmail.com
>         <mailto:bearzheng2011@gmail.com>
>         <mailto:bearzheng2011@gmail.__com <mailto:bearzheng2011@gmail.com>>
>                  <mailto:bearzheng2011@gmail.
>         <mailto:bearzheng2011@gmail.>____com
>         <mailto:bearzheng2011@gmail.__com <mailto:bearzheng2011@gmail.com>>>
>
>
>
>
>
>
>
>
>         --
>         Sincerely,
>         Zheng Yu
>         Mobile:  (852) 60670059
>         Email: bearzheng2011@gmail.com <mailto:bearzheng2011@gmail.com>
>         <mailto:bearzheng2011@gmail.__com <mailto:bearzheng2011@gmail.com>>
>
>
>
>
>
>
>
> --
> Sincerely,
> Zheng Yu
> Mobile:  (852) 60670059
> Email: bearzheng2011@gmail.com <mailto:bearzheng2011@gmail.com>
>
>
>


Mime
View raw message