spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Wass <>
Subject PermGen issues on AWS
Date Fri, 09 Jan 2015 10:38:41 GMT
I'm running on an AWS cluster of 10 x m1.large (64 bit, 7.5 GiB RAM). FWIW
I'm using the Flambo Clojure wrapper which uses the Java API but I don't
think that should make any difference. I'm running with the following

spark/bin/spark-submit --class mything.core --name "My Thing" --conf
spark.yarn.executor.memoryOverhead=4096 --conf
-XX:+CMSPermGenSweepingEnabled" /root/spark/code/myjar.jar

For one of the stages I'm getting errors:

 - ExecutorLostFailure (executor lost)
 - Resubmitted (resubmitted due to lost executor)

And I think they're caused by slave executor JVMs dying up with this error:

java.lang.OutOfMemoryError: PermGen space
        java.lang.Class.getDeclaredConstructors0(Native Method)


sun.reflect.MethodAccessorGenerator$ Method)



sun.reflect.ReflectionFactory.newConstructorForSerialization($1500($$ Method)<init>(

1 stage out of 14 (so far) is failing. My failing stage is 1768 succeeded /
1862 (940 failed). 7 tasks failed with OOM, 919 were "Resubmitted
(resubmitted due to lost executor)".

Now my "Aggregated Metrics by Executor" shows that 10 out of 16 executors
show "CANNOT FIND ADDRESS" which I imagine means the JVM blew up and hasn't
been restarted. Now the 'Executors' tab shows only 7 executors.

 - Is this normal?
 - Any ideas why this is happening?
 - Any other measures I can take to prevent this?
 - Is the rest of my app going to run on a reduced number of executors?
 - Can I re-start the executors mid-application? This is a long-running
job, so I'd like to do what I can whilst it's running, if possible.
 - Am I correct in thinking that the --conf arguments are supplied to the
JVMs of the slave executors, so they will be receiving the extraJavaOptions
and memoryOverhead?

Thanks very much!


View raw message