spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: kafka streaming topic partitions vs executors
Date Fri, 26 Feb 2016 17:06:36 GMT
Spark in general isn't a good fit if you're trying to make sure that
certain tasks only run on certain executors.

You can look at overriding getPreferredLocations and increasing the value
of spark.locality.wait, but even then, what do you do when an executor
fails?

On Fri, Feb 26, 2016 at 8:08 AM, patcharee <Patcharee.Thongtra@uni.no>
wrote:

> Hi,
>
> I am working a streaming application integrated with Kafka by the API
> createDirectStream. The application streams a topic which contains 10
> partitions (on Kafka). It executes with 10 workers (--num-executors 10)
> When it reads data from Kafka/ZooKeeper, Spark creates 10 tasks (as same as
> the topic's partitions). However some executors are given more than 1 tasks
> and work on these tasks sequentially.
>
> Why Spark does not distribute these 10 tasks to 10 executors? How to do
> that?
>
> Thanks,
> Patcharee
>
>
>

Mime
View raw message