spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: spark 1.6 foreachPartition only appears to be running on one executor
Date Fri, 11 Mar 2016 18:40:49 GMT
Hi,

Could you share the code with foreachPartition?

Jacek
11.03.2016 7:33 PM "Darin McBeath" <ddmcbeath@yahoo.com> napisał(a):

>
>
> I can verify this by looking at the log file for the workers.
>
> Since I output logging statements in the object called by the
> foreachPartition, I can see the statements being logged. Oddly, these
> output statements only occur in one executor (and not the other).  It
> occurs 16 times in this executor  since there are 16 partitions.  This
> seems odd as there are only 8 cores on the executor and the other executor
> doesn't appear to be called at all in the foreachPartition call.  But, when
> I go to do a map function on this same RDD then things start blowing up on
> the other executor as it starts doing work for some partitions (although,
> it would appear that all partitions were only initialized on the other
> executor). The executor that was used in the foreachPartition call works
> fine and doesn't experience issue.  But, because the other executor is
> failing on every request the job dies.
>
> Darin.
>
>
> ________________________________
> From: Jacek Laskowski <jacek@japila.pl>
> To: Darin McBeath <ddmcbeath@yahoo.com>
> Cc: user <user@spark.apache.org>
> Sent: Friday, March 11, 2016 1:24 PM
> Subject: Re: spark 1.6 foreachPartition only appears to be running on one
> executor
>
>
>
> Hi,
> How do you check which executor is used? Can you include a screenshot of
> the master's webUI with workers?
> Jacek
> 11.03.2016 6:57 PM "Darin McBeath" <ddmcbeath@yahoo.com.invalid>
> napisał(a):
>
> I've run into a situation where it would appear that foreachPartition is
> only running on one of my executors.
> >
> >I have a small cluster (2 executors with 8 cores each).
> >
> >When I run a job with a small file (with 16 partitions) I can see that
> the 16 partitions are initialized but they all appear to be initialized on
> only one executor.  All of the work then runs on this  one executor (even
> though the number of partitions is 16). This seems odd, but at least it
> works.  Not sure why the other executor was not used.
> >
> >However, when I run a larger file (once again with 16 partitions) I can
> see that the 16 partitions are initialized once again (but all on the same
> executor).  But, this time subsequent work is now spread across the 2
> executors.  This of course results in problems because the other executor
> was not initialized as all of the partitions were only initialized on the
> other executor.
> >
> >Does anyone have any suggestions for where I might want to investigate?
> Has anyone else seen something like this before?  Any thoughts/insights
> would be appreciated.  I'm using the Stand Alone Cluster manager, cluster
> started with the spark ec2 scripts  and submitting my job using
> spark-submit.
> >
> >Thanks.
> >
> >Darin.
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >For additional commands, e-mail: user-help@spark.apache.org
> >
> >
>

Mime
View raw message