kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Le Cyberian <lecyber...@gmail.com>
Subject Re: Having 4 Node Kafka Cluster
Date Mon, 06 Mar 2017 19:37:34 GMT
Hi Han,

Thank you for your response. I understand. Its not possible to have a third
rack/server room at the moment as the requirement is to have redundancy
between both. I tried already to get one :-/

Is it possible to have a Zookeeper Ensemble (3 node) in one server room and
same in the other and have some sort of master-master replication in
between both of them ? would this make sense if its possible ? since in
this case both would have same config and split brain theoretically should
not happen.

I haven't does this Zookeeper 3rd node hack before :) i guess i need to
play around with it for a while to get it proper documented and functional
/ tested :)

Thanks again!

Le

On Mon, Mar 6, 2017 at 8:22 PM, Hans Jespersen <hans@confluent.io> wrote:

>
> Is there any way you can find a third rack/server room/power supply nearby
> just for the 1 extra zookeeper node? You don’t have to put any kafka
> brokers there, just a single zookeeper.  It’s less likely to have a 3-way
> split brain because of a network partition. It’s so much cleaner with 3
> availability zones because everything would be automatic failover. This is
> how most people run when deployed in Amazon.
>
> Baring that I would say the next best thing would be 3 zookeepers in one
> zone and 2 zookeepers in the other zone so it will auto-failover if the 2
> zk zone fails. If the 3 zk zone fails you can setup a well tested set of
> manual steps to carefully configure a 3rd zookeeper clone (which matches
> the id of one of the failed nodes) and still get your system back up and
> running. If this is not something you have done before I suggest getting a
> few days of expert consulting to have someone help you set it up, test it,
> and document the proper failover and recovery procedures.
>
> -hans
>
>
>
>
> > On Mar 6, 2017, at 10:44 AM, Le Cyberian <lecyberian@gmail.com> wrote:
> >
> > Thanks Han and Alexander for taking time out and your responses.
> >
> > I now understand the risks and the possible outcome of having the desired
> > setup.
> >
> > What would be better in your opinion to have failover (active-active)
> > between both of these server rooms to avoid switching to the clone / 3rd
> > zookeeper.
> >
> > I mean even if there are 5 nodes having 3 in one server room and 2 in
> other
> > still there would be problem related to zookeeper majority leader
> election
> > if the server room goes down that has 3 nodes.
> >
> > is there some way to achieve this ?
> >
> > Thanks again!
> >
> > Lee
> >
> > On Mon, Mar 6, 2017 at 4:16 PM, Alexander Binzberger <
> > alexander.binzberger@wingcon.com> wrote:
> >
> >> I agree on this is one cluster but having one additional ZK node per
> site
> >> does not help. (as far as I understand ZK)
> >>
> >> A 3 out of 6 is also not a majority. So I think you mean 3/5 with a
> cloned
> >> 3rd one. This would mean manually switching the cloned one for majority
> >> which can cause issues again.
> >> 1. You actually build a master/slave ZK with manually switch over.
> >> 2. While switching the clone from room to room you would have downtime.
> >> 3. If you switch on both ZK node clones at the same time (by mistake)
> you
> >> screwed.
> >> 4. If you "switch" clones instead of moving it will all data on disk you
> >> generate a split brain from which you have to recover first.
> >>
> >> So if you loose the connection between the rooms / the rooms get
> separated
> >> / you loose one room:
> >> * You (might) need manual interaction
> >> * loose automatic fail-over between the rooms
> >> * might face complete outage if your "master" room with the active 3rd
> >> node is hit.
> >> Actually this is the same scenario with 2/3 nodes spread over two
> >> locations.
> >>
> >> What you need is a third cross connected location for real fault
> tolerance
> >> and distribute your 3 or 5 ZK nodes over those.
> >> Or live with a possible outage in such a scenario.
> >>
> >> Additional Hints:
> >> * You can run any number of Kafka brokers on a ZK cluster. In your case
> >> this could be 4 Kafka brokers on 3 ZK nodes.
> >> * You should set topic replication to 2 (can be done at any time) and
> some
> >> other producer/broker settings to ensure your messages will not get
> lost in
> >> switch over cases.
> >> * ZK service does not react nicely on disk full.
> >>
> >>
> >>
> >> Am 06.03.2017 um 15:10 schrieb Hans Jespersen:
> >>
> >>> In that case it’s really one cluster. Make sure to set different rack
> ids
> >>> for each server room so kafka will ensure that the replicas always span
> >>> both floors and you don’t loose availability of data if a server room
> goes
> >>> down.
> >>> You will have to configure one addition zookeeper node in each site
> which
> >>> you will only ever startup if a site goes down because otherwise 2 of 4
> >>> zookeeper nodes is not a quorum.Again you would be better with 3 nodes
> >>> because then you would only have to do this in the site that has the
> single
> >>> active node.
> >>>
> >>> -hans
> >>>
> >>>
> >>> On Mar 6, 2017, at 5:57 AM, Le Cyberian <lecyberian@gmail.com> wrote:
> >>>>
> >>>> Hi Hans,
> >>>>
> >>>> Thank you for your reply.
> >>>>
> >>>> Its basically two different server rooms on different floors and they
> are
> >>>> connected with fiber connectivity so its almost like a local
> connection
> >>>> between them no network latencies / lag.
> >>>>
> >>>> If i do a Mirror Maker / Replicator then i will not be able to use
> them
> >>>> at
> >>>> the same time for writes./ producers. because the consumers /
> producers
> >>>> will request from all of them
> >>>>
> >>>> BR,
> >>>>
> >>>> Lee
> >>>>
> >>>> On Mon, Mar 6, 2017 at 2:50 PM, Hans Jespersen <hans@confluent.io>
> >>>> wrote:
> >>>>
> >>>> What do you mean when you say you have "2 sites not datacenters"? You
> >>>>> should be very careful configuring a stretch cluster across multiple
> >>>>> sites.
> >>>>> What is the RTT between the two sites? Why do you think that MIrror
> >>>>> Maker
> >>>>> (or Confluent Replicator) would not work between the sites and yet
> you
> >>>>> think a stretch cluster will work? That seems wrong.
> >>>>>
> >>>>> -hans
> >>>>>
> >>>>> /**
> >>>>> * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
> >>>>> * hans@confluent.io (650)924-2670
> >>>>> */
> >>>>>
> >>>>> On Mon, Mar 6, 2017 at 5:37 AM, Le Cyberian <lecyberian@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> Hi Guys,
> >>>>>>
> >>>>>> Thank you very much for you reply.
> >>>>>>
> >>>>>> The scenario which i have to implement is that i have 2 sites
not
> >>>>>> datacenters so mirror maker would not work here.
> >>>>>>
> >>>>>> There will be 4 nodes in total, like 2 in Site A and 2 in Site
B.
> The
> >>>>>>
> >>>>> idea
> >>>>>
> >>>>>> is to have Active-Active setup along with fault tolerance so
that if
> >>>>>> one
> >>>>>>
> >>>>> of
> >>>>>
> >>>>>> the site goes on the operations are normal.
> >>>>>>
> >>>>>> In this case if i go ahead with 4 node-cluster of both zookeeper
and
> >>>>>>
> >>>>> kafka
> >>>>>
> >>>>>> it will give failover tolerance for 1 node only.
> >>>>>>
> >>>>>> What do you suggest to do in this case ? because to divide between
2
> >>>>>>
> >>>>> sites
> >>>>>
> >>>>>> it needs to be even number if that makes sense ? Also if possible
> some
> >>>>>>
> >>>>> help
> >>>>>
> >>>>>> regarding partitions for topic and replication factor.
> >>>>>>
> >>>>>> I already have Kafka running with quiet few topics having
> replication
> >>>>>> factor 1 along with 1 default partition, is there a way to
> repartition
> >>>>>> /
> >>>>>> increase partition of existing topics when i migrate to above
setup
> ? I
> >>>>>> think we can increase replication factor by Kafka rebalance
tool.
> >>>>>>
> >>>>>> Thanks alot for your help and time looking into this.
> >>>>>>
> >>>>>> BR,
> >>>>>>
> >>>>>> Le
> >>>>>>
> >>>>>> On Mon, Mar 6, 2017 at 12:20 PM, Hans Jespersen <hans@confluent.io>
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Jens,
> >>>>>>>
> >>>>>>> I think you are correct that a 4 node zookeeper ensemble
can be
> made
> >>>>>>> to
> >>>>>>> work but it will be slightly less resilient than a 3 node
ensemble
> >>>>>>>
> >>>>>> because
> >>>>>>
> >>>>>>> it can only tolerate 1 failure (same as a 3 node ensemble)
and the
> >>>>>>> likelihood of node failures is higher because there is 1
more node
> >>>>>>> that
> >>>>>>> could fail.
> >>>>>>> So it SHOULD be an odd number of zookeeper nodes (not MUST).
> >>>>>>>
> >>>>>>> -hans
> >>>>>>>
> >>>>>>>
> >>>>>>> On Mar 6, 2017, at 12:20 AM, Jens Rantil <jens.rantil@tink.se>
> >>>>>>>>
> >>>>>>> wrote:
> >>>>>
> >>>>>> Hi Hans,
> >>>>>>>>
> >>>>>>>> On Mon, Mar 6, 2017 at 12:10 AM, Hans Jespersen <
> hans@confluent.io>
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> A 4 node zookeeper ensemble will not even work. It MUST
be an odd
> >>>>>>>>>
> >>>>>>>> number
> >>>>>>
> >>>>>>> of zookeeper nodes to start.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Are you sure about that? If Zookeer doesn't run with
four nodes,
> that
> >>>>>>>>
> >>>>>>> means
> >>>>>>>
> >>>>>>>> a running ensemble of three can't be live-migrated to
other nodes
> >>>>>>>>
> >>>>>>> (because
> >>>>>>>
> >>>>>>>> that's done by increasing the ensemble and then reducing
it in the
> >>>>>>>>
> >>>>>>> case
> >>>>>
> >>>>>> of
> >>>>>>>
> >>>>>>>> 3-node ensembles). IIRC, you can run four Zookeeper
nodes, but
> that
> >>>>>>>>
> >>>>>>> means
> >>>>>>
> >>>>>>> quorum will be three nodes, so there's no added benefit
in terms of
> >>>>>>>> availability since you can only loose one node just
like with a
> three
> >>>>>>>>
> >>>>>>> node
> >>>>>>>
> >>>>>>>> cluster.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Jens
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Jens Rantil
> >>>>>>>> Backend engineer
> >>>>>>>> Tink AB
> >>>>>>>>
> >>>>>>>> Email: jens.rantil@tink.se
> >>>>>>>> Phone: +46 708 84 18 32
> >>>>>>>> Web: www.tink.se
> >>>>>>>>
> >>>>>>>> Facebook <https://www.facebook.com/#!/tink.se>
Linkedin
> >>>>>>>> <http://www.linkedin.com/company/2735919?trk=vsrp_
> >>>>>>>>
> >>>>>>> companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
> >>>>>>> 2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
> >>>>>>>
> >>>>>>>> Twitter <https://twitter.com/tink>
> >>>>>>>>
> >>>>>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message