spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evgeny Shishkin <>
Subject Re: KafkaInputDStream mapping of partitions to tasks
Date Thu, 27 Mar 2014 23:27:37 GMT

On 28 Mar 2014, at 02:10, Scott Clasen <> wrote:

> Thanks everyone for the discussion.
> Just to note, I restarted the job yet again, and this time there are indeed
> tasks being executed by both worker nodes. So the behavior does seem
> inconsistent/broken atm.
> Then I added a third node to the cluster, and a third executor came up, and
> everything broke :|

This is kafka’s high-level consumer. Try to raise rebalance retries.

Also, as this consumer is threaded, it have some protection against this failure - first it
waits some time, and then rebalances.
But for spark cluster i think this time is not enough.
If there was a way to wait every spark executor to start, rebalance, and only when start to
consume, this issue would be less visible.   

> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at

View raw message