spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: How does MapWithStateRDD distribute the data
Date Wed, 03 Aug 2016 16:34:06 GMT
Are you using KafkaUtils.createDirectStream?

On Wed, Aug 3, 2016 at 9:42 AM, Soumitra Johri
<soumitra.siddharth@gmail.com> wrote:
> Hi,
>
> I am running a steaming job with 4 executors and 16 cores so that each
> executor has two cores to work with. The input Kafka topic has 4 partitions.
> With this given configuration I was expecting MapWithStateRDD to be evenly
> distributed across all executors, how ever I see that it uses only two
> executors on which MapWithStateRDD data is distributed. Sometimes the data
> goes only to one executor.
>
> How can this be explained and pretty sure there would be some math to
> understand this behavior.
>
> I am using the standard standalone 1.6.2 cluster.
>
> Thanks
> Soumitra

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message