On Mon, Mar 13, 2017 at 3:20 AM, churly lin <churylin@gmail.com> wrote:
HI all:
I am using spark streaming(1.6.2) + kafka(  to be specific, I read events from kafka topic by spark streaming direct approach.
kafka: 1 topic 10 partitions.
spark streaming: 10 executors according to 10 kafka partitions. The batch window time is set 60s.

After running, the spark streaming processing time is about 20s, much less than the batch window size. but no matter how the input rate of the kafka producer changed(3000 events/sec, 4000 events/sec, 6000 events/sec), the input rate of spark streaming(kafka consumer) was always about 3000 events/sec. which means the spark streaming(kafka consumer side) couldn't catch up with the kafka producer side. So, is there a way to increase the throughput of the spark streaming + kafka(direct approach) system?

I hava tried to increase the kafka partitions from 10 to 20, accordingly, increase the executors from 10 to 20, but didn't work.