spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Teeuwen <>
Subject Re: How does MapWithStateRDD distribute the data
Date Wed, 03 Aug 2016 16:32:51 GMT
Did you check the executors logs to check whether the kafka offsets pulled in evenly over the
4 executors?

I recall a similar situation with such uneven balancing from a kafka stream, and ended up
raising the amount of resources (RAM and cores). Then it nicely balanced out. I don’t understand
the mechanism behind it though.

> On Aug 3, 2016, at 4:42 PM, Soumitra Johri <> wrote:
> Hi,
> I am running a steaming job with 4 executors and 16 cores so that each executor has two
cores to work with. The input Kafka topic has 4 partitions.
> With this given configuration I was expecting MapWithStateRDD to be evenly distributed
across all executors, how ever I see that it uses only two executors on which MapWithStateRDD
data is distributed. Sometimes the data goes only to one executor.
> How can this be explained and pretty sure there would be some math to understand this
> I am using the standard standalone 1.6.2 cluster.
> Thanks
> Soumitra

To unsubscribe e-mail:

View raw message