spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivani Rao <raoshiv...@gmail.com>
Subject Re: Spark 0.9.1 java.lang.outOfMemoryError: Java Heap Space
Date Fri, 20 Jun 2014 15:15:37 GMT
Hello Abhi, I did try that and it did not work

And Eugene, Yes I am assembling the argonaut libraries in the fat jar. So
how did you overcome this problem?

Shivani


On Fri, Jun 20, 2014 at 1:59 AM, Eugen Cepoi <cepoi.eugen@gmail.com> wrote:

>
> Le 20 juin 2014 01:46, "Shivani Rao" <raoshivani@gmail.com> a écrit :
>
> >
> > Hello Andrew,
> >
> > i wish I could share the code, but for proprietary reasons I can't. But
> I can give some idea though of what i am trying to do. The job reads a file
> and for each line of that file and processors these lines. I am not doing
> anything intense in the "processLogs" function
> >
> > import argonaut._
> > import argonaut.Argonaut._
> >
> >
> > /* all of these case classes are created from json strings extracted
> from the line in the processLogs() function
> > *
> > */
> > case class struct1…
> > case class struct2…
> > case class value1(struct1, struct2)
> >
> > def processLogs(line:String): Option[(key1, value1)] {…
> > }
> >
> > def run(sparkMaster, appName, executorMemory, jarsPath) {
> >   val sparkConf = new SparkConf()
> >    sparkConf.setMaster(sparkMaster)
> >    sparkConf.setAppName(appName)
> >    sparkConf.set("spark.executor.memory", executorMemory)
> >     sparkConf.setJars(jarsPath) // This includes all the jars relevant
> jars..
> >    val sc = new SparkContext(sparkConf)
> >   val rawLogs = sc.textFile("hdfs://<my-hadoop-namenode:8020:myfile.txt")
> >
> rawLogs.saveAsTextFile("hdfs://<my-hadoop-namenode:8020:writebackForTesting")
> >
> rawLogs.flatMap(processLogs).saveAsTextFile("hdfs://<my-hadoop-namenode:8020:outfile.txt")
> > }
> >
> > If I switch to "local" mode, the code runs just fine, it fails with the
> error I pasted above. In the cluster mode, even writing back the file we
> just read fails
> (rawLogs.saveAsTextFile("hdfs://<my-hadoop-namenode:8020:writebackForTesting")
> >
> > I still believe this is a classNotFound error in disguise
> >
>
> Indeed you are right, this can be the reason. I had similar errors when
> defining case classes in the shell and trying to use them in the RDDs. Are
> you shading argonaut in the fat jar ?
>
> > Thanks
> > Shivani
> >
> >
> >
> > On Wed, Jun 18, 2014 at 2:49 PM, Andrew Ash <andrew@andrewash.com>
> wrote:
> >>
> >> Wait, so the file only has four lines and the job running out of heap
> space?  Can you share the code you're running that does the processing?
>  I'd guess that you're doing some intense processing on every line but just
> writing parsed case classes back to disk sounds very lightweight.
> >>
> >> I
> >>
> >>
> >> On Wed, Jun 18, 2014 at 5:17 PM, Shivani Rao <raoshivani@gmail.com>
> wrote:
> >>>
> >>> I am trying to process a file that contains 4 log lines (not very
> long) and then write my parsed out case classes to a destination folder,
> and I get the following error:
> >>>
> >>>
> >>> java.lang.OutOfMemoryError: Java heap space
> >>>
> >>> at
> org.apache.hadoop.io.WritableUtils.readCompressedStringArray(WritableUtils.java:183)
> >>>
> >>> at
> org.apache.hadoop.conf.Configuration.readFields(Configuration.java:2244)
> >>>
> >>> at
> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
> >>>
> >>> at
> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
> >>>
> >>> at
> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
> >>>
> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>
> >>> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>>
> >>> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>
> >>> at java.lang.reflect.Method.invoke(Method.java:597)
> >>>
> >>> at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
> >>>
> >>> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
> >>>
> >>> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
> >>>
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
> >>>
> >>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
> >>>
> >>> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> >>>
> >>> at
> org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:165)
> >>>
> >>> at
> org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
> >>>
> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>
> >>> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>>
> >>> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>>
> >>> at java.lang.reflect.Method.invoke(Method.java:597)
> >>>
> >>> at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
> >>>
> >>> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
> >>>
> >>> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
> >>>
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
> >>>
> >>> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
> >>>
> >>> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
> >>>
> >>> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
> >>>
> >>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
> >>>
> >>> at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
> >>>
> >>> at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
> >>>
> >>> at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
> >>>
> >>>
> >>> Sadly, there are several folks that have faced this error while trying
> to execute Spark jobs and there are various solutions, none of which work
> for me
> >>>
> >>>
> >>> a) I tried (
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-0-java-lang-outOfMemoryError-Java-Heap-Space-td7735.html#a7736)
> changing the number of partitions in my RDD by using coalesce(8) and the
> error persisted
> >>>
> >>> b)  I tried changing SPARK_WORKER_MEM=2g, SPARK_EXECUTOR_MEMORY=10g,
> and both did not work
> >>>
> >>> c) I strongly suspect there is a class path error (
> http://apache-spark-user-list.1001560.n3.nabble.com/how-to-set-spark-executor-memory-and-heap-size-td4719.html)
> Mainly because the call stack is repetitive. Maybe the OOM error is a
> disguise ?
> >>>
> >>> d) I checked that i am not out of disk space and that i do not have
> too many open files (ulimit -u << sudo ls
> /proc/<spark_master_process_id>/fd | wc -l)
> >>>
> >>>
> >>> I am also noticing multiple reflections happening to find the right
> "class" i guess, so it could be "class Not Found: error disguising itself
> as a memory error.
> >>>
> >>>
> >>> Here are other threads that are encountering same situation .. but
> have not been resolved in any way so far..
> >>>
> >>>
> >>>
> http://apache-spark-user-list.1001560.n3.nabble.com/no-response-in-spark-web-UI-td4633.html
> >>>
> >>>
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-program-thows-OutOfMemoryError-td4268.html
> >>>
> >>>
> >>> Any help is greatly appreciated. I am especially calling out on
> creators of Spark and Databrick folks. This seems like a "known bug"
> waiting to happen.
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Shivani
> >>>
> >>>
> >>> --
> >>> Software Engineer
> >>> Analytics Engineering Team@ Box
> >>> Mountain View, CA
> >>
> >>
> >
> >
> >
> > --
> > Software Engineer
> > Analytics Engineering Team@ Box
> > Mountain View, CA
>



-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Mime
View raw message