Spark in general isn't a good fit if you're trying to make sure that certain tasks only run on certain executors.

You can look at overriding getPreferredLocations and increasing the value of spark.locality.wait, but even then, what do you do when an executor fails?

On Fri, Feb 26, 2016 at 8:08 AM, patcharee <> wrote:

I am working a streaming application integrated with Kafka by the API createDirectStream. The application streams a topic which contains 10 partitions (on Kafka). It executes with 10 workers (--num-executors 10) When it reads data from Kafka/ZooKeeper, Spark creates 10 tasks (as same as the topic's partitions). However some executors are given more than 1 tasks and work on these tasks sequentially.

Why Spark does not distribute these 10 tasks to 10 executors? How to do that?