spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evgeny Shishkin <itparan...@gmail.com>
Subject Re: KafkaInputDStream mapping of partitions to tasks
Date Thu, 27 Mar 2014 23:27:37 GMT

On 28 Mar 2014, at 02:10, Scott Clasen <scott.clasen@gmail.com> wrote:

> Thanks everyone for the discussion.
> 
> Just to note, I restarted the job yet again, and this time there are indeed
> tasks being executed by both worker nodes. So the behavior does seem
> inconsistent/broken atm.
> 
> Then I added a third node to the cluster, and a third executor came up, and
> everything broke :|
> 
> 

This is kafka’s high-level consumer. Try to raise rebalance retries.

Also, as this consumer is threaded, it have some protection against this failure - first it
waits some time, and then rebalances.
But for spark cluster i think this time is not enough.
If there was a way to wait every spark executor to start, rebalance, and only when start to
consume, this issue would be less visible.   



> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/KafkaInputDStream-mapping-of-partitions-to-tasks-tp3360p3391.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.


Mime
View raw message