spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakub Stransky <stransky...@gmail.com>
Subject Standalone cluster node utilization
Date Thu, 14 Jul 2016 16:18:49 GMT
Hello,

I have a spark  cluster running in a single mode, master + 6 executors.

My application is reading a data from database via DataFrame.read then
there is a filtering of rows. After that I re-partition data and I wonder
why on the executors page of the driver UI I see RDD blocks all allocated
still on single executor machine

[image: Inline images 1]
As highlighted on the picture above. I did expect that after re-partition
the data will be shuffled across cluster but that is obviously not
happening here.

I can understand that database read is happening in non-parallel fashion
but re-partition  should fix it as far as I understand.

Could someone experienced clarify that?

Thanks

Mime
View raw message