spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darin McBeath <>
Subject spark 1.6 foreachPartition only appears to be running on one executor
Date Fri, 11 Mar 2016 17:57:24 GMT
I've run into a situation where it would appear that foreachPartition is only running on one
of my executors.

I have a small cluster (2 executors with 8 cores each).

When I run a job with a small file (with 16 partitions) I can see that the 16 partitions are
initialized but they all appear to be initialized on only one executor.  All of the work then
runs on this  one executor (even though the number of partitions is 16). This seems odd, but
at least it works.  Not sure why the other executor was not used.

However, when I run a larger file (once again with 16 partitions) I can see that the 16 partitions
are initialized once again (but all on the same executor).  But, this time subsequent work
is now spread across the 2 executors.  This of course results in problems because the other
executor was not initialized as all of the partitions were only initialized on the other executor.

Does anyone have any suggestions for where I might want to investigate?  Has anyone else seen
something like this before?  Any thoughts/insights would be appreciated.  I'm using the Stand
Alone Cluster manager, cluster started with the spark ec2 scripts  and submitting my job using



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message