spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 周家帅 <zji...@gmail.com>
Subject [Spark Kafka] How to update batch size of input dynamically for spark kafka consumer?
Date Tue, 03 Jan 2017 10:00:40 GMT
Hi,

I am an intermediate spark user and have some experience in large data
processing. I post this question in StackOverflow but receive no response.
My problem is as follows:

I use createDirectStream in my spark streaming application. I set the batch
interval to 7 seconds and most of the time the batch job can finish within
about 5 seconds. However, in very rare cases, the batch job need cost 60
seconds and this will delay some batches of jobs. To cut down the total
delay time for these batches, I hope I can process more streaming data
which spread over the delayed jobs at one time. This will help the
streaming return to normal as soon as possible.

So, I want to know there is some method to dynamically update/merge batch
size of input for spark and kafka when delay appears.

Many thanks for your help.

-- 
Jiashuai Zhou

School of Electronics Engineering and Computer Science,
Peking University

Mime
View raw message