spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: [Spark Streaming+Kafka][How-to]
Date Thu, 16 Mar 2017 15:10:56 GMT
Spark just really isn't a good fit for trying to pin particular computation
to a particular executor, especially if you're relying on that for
correctness.

On Thu, Mar 16, 2017 at 7:16 AM, OUASSAIDI, Sami <sami.ouassaidi@mind7.fr>
wrote:

>
> Hi all,
>
> So I need to specify how an executor should consume data from a kafka
> topic.
>
> Let's say I have 2 topics : t0 and t1 with two partitions each, and two
> executors e0 and e1 (both can be on the same node so assign strategy does
> not work since in the case of a multi executor node it works based on round
> robin scheduling, whatever first available executor consumes the topic
> partition )
>
> What I would like to do is make e0 consume partition 0 from both t0 and t1
> while e1 consumes partition 1 from the t0 and t1. Is there no way around it
> except messing with scheduling ? If so what's the best approach.
>
> The reason for doing so is that executors will write to a cassandra
> database and since we will be in a parallelized context one executor might
> "collide" with another and therefore data will be lost, by assigning a
> partition I want to force the executor to process the data sequentially.
>
> Thanks
> Sami
> --
> *Mind7 Consulting*
>
> Sami Ouassaid | Consultant Big Data | sami.ouassaidi@mind7.com
> __
>
> 64 Rue Taitbout, 75009 Paris
> ᐧ
>

Mime
View raw message