spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivani Rao <raoshiv...@gmail.com>
Subject Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space
Date Wed, 18 Jun 2014 21:17:50 GMT
I am trying to process a file that contains 4 log lines (not very long) and
then write my parsed out case classes to a destination folder, and I get
the following error:


java.lang.OutOfMemoryError: Java heap space

at
org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)

at org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)

at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)

at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)

at
org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)

at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)

at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)

at
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)


Sadly, there are several folks that have faced this error while trying to
execute Spark jobs and there are various solutions, none of which work for
me


a) I tried (
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-td7735.html#a7736)
changing the number of partitions in my RDD by using coalesce(8) and the
error persisted

b)  I tried changing SPARK_WORKER_MEM=2g, SPARK_EXECUTOR_MEMORY=10g, and
both did not work

c) I strongly suspect there is a class path error (
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-td4719.html)
Mainly because the call stack is repetitive. Maybe the OOM error is a
disguise ?

d) I checked that i am not out of disk space and that i do not have too
many open files (ulimit -u << sudo ls /proc/<spark_master_process_id>/fd |
wc -l)


I am also noticing multiple reflections happening to find the right "class"
i guess, so it could be "class Not Found: error disguising itself as a
memory error.


Here are other threads that are encountering same situation .. but have not
been resolved in any way so far..


http://apache-spark-user-list.1001560.n3.nabble.com/no-response-in-spark-web-UI-td4633.html

http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html


Any help is greatly appreciated. I am especially calling out on creators of
Spark and Databrick folks. This seems like a "known bug" waiting to happen.


Thanks,

Shivani

-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Mime
View raw message