spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mal Edwin <>
Subject Re: Spark Streaming from Kafka, deal with initial heavy load.
Date Sat, 18 Mar 2017 20:37:43 GMT

You can enable backpressure to handle this.



On Mar 18, 2017, 12:53 AM -0400, sagarcasual . <>, wrote:
> Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct approach. The
streaming part works fine but when we initially start the job, we have to deal with really
huge Kafka message backlog, millions of messages, and that first batch runs for over 40 hours,
 and after 12 hours or so it becomes very very slow, it keeps crunching messages, but at
a very low speed. Any idea how to overcome this issue? Once the job is all caught up, subsequent
batches are quick and fast since the load is really tiny to process. So any idea how to avoid
this problem?

View raw message