kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun Rao <...@confluent.io>
Subject Re: Kafka Replicated Partition Limits
Date Wed, 03 Jan 2018 22:50:52 GMT
Hi, Andrey,

Thanks for reporting the results. Which version of Kafka are you testing?
Also, it would be useful to know if you are testing the normal mode when
all replicas are up and in sync, or the failure mode when some of the
replicas are being restarted. Typically, ZK is only accessed in the failure
mode.

We have made some significant improvement in the failure mode by reducing
the logging overhead (KAFKA-6116) and making the ZK accesses async
(KAFKA-5642). These won't necessarily reduce the number of requests to ZK,
but will allow better pipelining when accessing ZK.

In the normal mode, we are now discussing KIP-227, which could reduce the
overhead for replication and consumption when there are many partitions.

Jun

On Wed, Jan 3, 2018 at 1:48 PM, Andrey Falko <afalko@salesforce.com> wrote:

> Hi everyone,
>
> We are seeing more and more push from our Kafka users to support well
> more than 10k replicated partitions. We'd ideally like to avoid running
> multiple
> clusters to keep our cluster management and monitoring simple. We started
> testing kafka to see how many replicated partitions it could handle.
>
> We found that, to maintain SLAs of under 50ms for produce latency,
> Kafka starts going downhill at around 9k topics with 5 brokers. Each topic
> is
> replicated 3x in our test. The bottleneck appears to be zookeeper:
> after a certain
> period of time, the number of outstanding requests in ZK spikes up at a
> linear rate. Slowing down the rate at which we create and produce to
> topics,
> improves things, but doing that makes the system tougher to manage and use.
> We are happy to publish our detailed results with reproduction
> steps if anyone is interested.
>
> Has anyone overcome this problem and scaled beyond 9k replicated
> partitions?
> Does anyone have zookeeper tuning suggestions? Is it even the bottleneck?
>
> According to this we should have at most 300 3x replicated per broker:
> https://www.confluent.io/blog/how-to-choose-the-number-of-
> topicspartitions-in-a-kafka-cluster/
> Is anyone doing work to have kafka support more than that?
>
> Best regards,
> Andrey Falko
> Salesforce.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message