spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pala M Muthaia <mchett...@rocketfuelinc.com>
Subject Re: OOM exception during row deserialization
Date Mon, 12 Jan 2015 22:48:40 GMT
Does anybody have insight on this? Thanks.

On Fri, Jan 9, 2015 at 6:30 PM, Pala M Muthaia <mchettiar@rocketfuelinc.com>
wrote:

> Hi,
>
> I am using Spark 1.0.1. I am trying to debug a OOM exception i saw during
> a join step.
>
> Basically, i have a RDD of rows, that i am joining with another RDD of
> tuples.
>
> Some of the tasks succeed but a fair number failed with OOM exception with
> stack below. The stack belongs to the 'reducer' that is reading shuffle
> output from the 'mapper'.
>
> My question is what's the object being deserialized here - just a portion
> of an RDD or the whole RDD partition assigned to current reducer? The rows
> in the RDD could be large, but definitely not something that would run to
> 100s of MBs in size, and thus run out of memory.
>
> Also, is there a way to determine size of the object being deserialized
> that results in the error (either by looking at some staging hdfs dir or
> logs)?
>
> java.lang.OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded}
> java.util.Arrays.copyOf(Arrays.java:2367)
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
> java.lang.StringBuilder.append(StringBuilder.java:204)
> java.io.ObjectInputStream$BlockDataInputStream.readUTFSpan(ObjectInputStream.java:3142)
> java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3050)
> java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2863)
> java.io.ObjectInputStream.readString(ObjectInputStream.java:1636)
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1339)
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> java.util.ArrayList.readObject(ArrayList.java:771)
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:606)
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:125)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> org.apache.spark.storage.BlockManager$LazyProxyIterator$1.hasNext(BlockManager.scala:1031)
>
>
>
> Thanks,
> pala
>

Mime
View raw message