spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Chao <xpnc54byp...@gmail.com>
Subject Scaling Kafka Streaming to Thousands of Partitions
Date Sat, 25 May 2019 23:32:42 GMT
Hi,

We have been using Spark Kafka streaming for real time processing with
success. The scale of this stream has been increasing with data growth, and
we have been able to scale up by adding more brokers to the Kafka cluster,
adding more partitions to the topic, and adding more executors to the spark
streaming app.

At this time our biggest topic has about 750 partitions. And in every mini
batch of the streaming app, the driver will fetch the metadata from Kafka
regarding this topic and arrange the tasks. I wonder will this step become
a bottleneck, if we continue to scale in this way? Is there any best
practices in scaling up the streaming job?

Thanks,

Charles

Mime
View raw message