spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From churly lin <chury...@gmail.com>
Subject The speed of Spark streaming reading data from kafka stays low
Date Mon, 13 Mar 2017 07:20:40 GMT
HI all:
I am using spark *streaming(1.6.2)* + *kafka(0.10.1.0)*.  to be specific, I
read events from kafka topic by *spark streaming direct approach*.
kafka: *1 topic 10 partitions*.
spark streaming: *10 executors *according to 10 kafka partitions. The*
batch window time* is set 60s.

After running, the spark streaming processing time is about 20s, much less
than the batch window size. but no matter how the input rate of the kafka
producer changed(3000 events/sec, 4000 events/sec, 6000 events/sec), the
input rate of spark streaming(kafka consumer) was always about 3000
events/sec. which means the spark streaming(kafka consumer side) couldn't
catch up with the kafka producer side. So, is there a way to increase the
throughput of the *spark streaming + kafka(direct approach) *system?

I hava tried to increase the kafka partitions from 10 to 20, accordingly,
increase the executors from 10 to 20, but didn't work.

​

​

Thanks.

Mime
View raw message