spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Wass <jw...@crossref.org>
Subject Re: PermGen issues on AWS
Date Fri, 09 Jan 2015 12:15:14 GMT
Thanks, I noticed this after posting. I'll try that.
I also think that perhaps Clojure might be creating more classes than the
equivalent Java would, so I'll nudge it a bit higher.

On 9 January 2015 at 11:45, Sean Owen <sowen@cloudera.com> wrote:

> It's normal for PermGen to be a bit more of an issue with Spark than
> for other JVM-based applications. You should simply increase the
> PermGen size, which I don't see in your command. -XX:MaxPermSize=256m
> allows it to grow to 256m for example. The right size depends on your
> total heap size and app.
>
> Also, Java 8 no longer has a permanent generation, so this particular
> type of problem and tuning is not needed. You might consider running
> on Java 8.
>
> On Fri, Jan 9, 2015 at 10:38 AM, Joe Wass <jwass@crossref.org> wrote:
> > I'm running on an AWS cluster of 10 x m1.large (64 bit, 7.5 GiB RAM).
> FWIW
> > I'm using the Flambo Clojure wrapper which uses the Java API but I don't
> > think that should make any difference. I'm running with the following
> > command:
> >
> > spark/bin/spark-submit --class mything.core --name "My Thing" --conf
> > spark.yarn.executor.memoryOverhead=4096 --conf
> > spark.executor.extraJavaOptions="-XX:+CMSClassUnloadingEnabled
> > -XX:+CMSPermGenSweepingEnabled" /root/spark/code/myjar.jar
> >
> > For one of the stages I'm getting errors:
> >
> >  - ExecutorLostFailure (executor lost)
> >  - Resubmitted (resubmitted due to lost executor)
> >
> > And I think they're caused by slave executor JVMs dying up with this
> error:
> >
> > java.lang.OutOfMemoryError: PermGen space
> >         java.lang.Class.getDeclaredConstructors0(Native Method)
> >         java.lang.Class.privateGetDeclaredConstructors(Class.java:2585)
> >         java.lang.Class.getConstructor0(Class.java:2885)
> >         java.lang.Class.newInstance(Class.java:350)
> >
> >
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:399)
> >
> >
> sun.reflect.MethodAccessorGenerator$1.run(MethodAccessorGenerator.java:396)
> >         java.security.AccessController.doPrivileged(Native Method)
> >
> >
> sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395)
> >
> >
> sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113)
> >
> >
> sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331)
> >
> >
> java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376)
> >         java.io.ObjectStreamClass.access$1500(ObjectStreamClass.java:72)
> >         java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:493)
> >         java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
> >         java.security.AccessController.doPrivileged(Native Method)
> >         java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
> >         java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
> >
>  java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
> >
> > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
> >
>  java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> >
> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> >
>  java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >
> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> >
> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> >
> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> >
>  java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >
> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> >
> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> >
> > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> >
>  java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >
> > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> >
> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> >
> >
> > 1 stage out of 14 (so far) is failing. My failing stage is 1768
> succeeded /
> > 1862 (940 failed). 7 tasks failed with OOM, 919 were "Resubmitted
> > (resubmitted due to lost executor)".
> >
> > Now my "Aggregated Metrics by Executor" shows that 10 out of 16 executors
> > show "CANNOT FIND ADDRESS" which I imagine means the JVM blew up and
> hasn't
> > been restarted. Now the 'Executors' tab shows only 7 executors.
> >
> >  - Is this normal?
> >  - Any ideas why this is happening?
> >  - Any other measures I can take to prevent this?
> >  - Is the rest of my app going to run on a reduced number of executors?
> >  - Can I re-start the executors mid-application? This is a long-running
> job,
> > so I'd like to do what I can whilst it's running, if possible.
> >  - Am I correct in thinking that the --conf arguments are supplied to the
> > JVMs of the slave executors, so they will be receiving the
> extraJavaOptions
> > and memoryOverhead?
> >
> > Thanks very much!
> >
> > Joe
>

Mime
View raw message