kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Palino <tpal...@gmail.com>
Subject Re: Kafka - Networking considerations
Date Mon, 09 May 2016 03:20:39 GMT
One thing to keep in mind is that the 18.5 Gbps number is across the entire
Kafka infrastructure at LinkedIn, not a single cluster. For a single
cluster, I’d say our peak right now is 4.7 Gbps inbound. However, because
Kafka is horizontally scalable, you can build a cluster to support what
you’re looking for.

At the very basic, if we have gigabit network interfaces, 16 Gbps requires
at least 16 brokers. This assumes a replication factor of 1, which is
almost certainly not what you want to do (as there is no protection within
the cluster for a node failure). If you have RF=2, every bit written into
your cluster by producers is going to generate a bit outbound from a broker
for replication, and a bit inbound on the broker it’s being replicated to.
This means that your inbound throughput is halved, and you need at least 32
brokers. This is simplistic - it doesn’t take into account performance, or
retention on disk, both of which you’ll need to think about. You can’t run
the network interfaces saturated all the time, but you can get pretty close
without having performance problems.

You also need to be concerned about your consumers here. If you have a
single consumer, there’s no real complication. It maps directly with your
producers. But I (for example) have 3 consumers (plus the replicas) on
average for any given message. That means I need 4 times as much outbound
bandwidth as I do inbound. If my producers write in 250 Mbps, I’m
saturating that 1 Gbps network link on the other side.

As far as laying out the brokers, this is where things will get a lot more
interesting. Ideally, you want the cluster to have as many failure domains
as possible, so you want to spread it out to lots of racks (to minimize the
impact of a power failure or a single switch failure, assuming a top of
rack switch). Which means you have to take into account the cross-talk
between the brokers for replication. Spine and leaf network topologies work
well here because they don’t involve quite so many cross-connects between
the racks to get big pipes.

Really, you’re going to have to work with your network engineers to figure
out what you have to work with, and plan within that. It’s hard to stand on
the outside and give you a solid plan, because I don’t know what else is
going on on your networks, where your producers are, what your consumers
are doing, or what your performance needs to look like.


On Fri, May 6, 2016 at 4:07 AM, Andrew Backhouse <BackhouseAndrew@gmail.com>

> Hello,
> I trust you are well.
> There's a wealth of great articles and presentations relating to the Kafka
> logic and design, thank you for these. But, I'm unable to find information
> relating to the network infrastructure that underpins Kafka. What I am
> trying to understand and put my Network Engineers at ease when it comes to
> sizing our Kafka cluster is as follows;
> Looking at an example deployment e.g. LinkedIn and their Kafka deployment,
> it processes 18.5Gbit/sec inbound.
> What design considerations were made to;
>    1. validate the network can support such throughput (inbound,
>    inter-cluster & outbound)
>    2. identify what network infrastructure config/layout can be used or
>    avoided to optimise for this level of traffic
> To help your answer, we're looking at potentially 16GBit/sec inbound which
> concerns our network team.
> If you can please share pointers to existing materials or specific details
> of your deployment, that will be great.
> Regards,
> Andrew

*Todd Palino*
Staff Site Reliability Engineer
Data Infrastructure Streaming


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message