spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darin McBeath <ddmcbe...@yahoo.com.INVALID>
Subject Re: spark 1.6 foreachPartition only appears to be running on one executor
Date Fri, 11 Mar 2016 18:32:57 GMT


I can verify this by looking at the log file for the workers.

Since I output logging statements in the object called by the foreachPartition, I can see
the statements being logged. Oddly, these output statements only occur in one executor (and
not the other).  It occurs 16 times in this executor  since there are 16 partitions.  This
seems odd as there are only 8 cores on the executor and the other executor doesn't appear
to be called at all in the foreachPartition call.  But, when I go to do a map function on
this same RDD then things start blowing up on the other executor as it starts doing work for
some partitions (although, it would appear that all partitions were only initialized on the
other executor). The executor that was used in the foreachPartition call works fine and doesn't
experience issue.  But, because the other executor is failing on every request the job dies.

Darin.


________________________________
From: Jacek Laskowski <jacek@japila.pl>
To: Darin McBeath <ddmcbeath@yahoo.com> 
Cc: user <user@spark.apache.org>
Sent: Friday, March 11, 2016 1:24 PM
Subject: Re: spark 1.6 foreachPartition only appears to be running on one executor



Hi, 
How do you check which executor is used? Can you include a screenshot of the master's webUI
with workers? 
Jacek 
11.03.2016 6:57 PM "Darin McBeath" <ddmcbeath@yahoo.com.invalid> napisaƂ(a):

I've run into a situation where it would appear that foreachPartition is only running on one
of my executors.
>
>I have a small cluster (2 executors with 8 cores each).
>
>When I run a job with a small file (with 16 partitions) I can see that the 16 partitions
are initialized but they all appear to be initialized on only one executor.  All of the work
then runs on this  one executor (even though the number of partitions is 16). This seems odd,
but at least it works.  Not sure why the other executor was not used.
>
>However, when I run a larger file (once again with 16 partitions) I can see that the 16
partitions are initialized once again (but all on the same executor).  But, this time subsequent
work is now spread across the 2 executors.  This of course results in problems because the
other executor was not initialized as all of the partitions were only initialized on the other
executor.
>
>Does anyone have any suggestions for where I might want to investigate?  Has anyone else
seen something like this before?  Any thoughts/insights would be appreciated.  I'm using the
Stand Alone Cluster manager, cluster started with the spark ec2 scripts  and submitting my
job using spark-submit.
>
>Thanks.
>
>Darin.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>For additional commands, e-mail: user-help@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message