spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Malouf <malouf.g...@gmail.com>
Subject Re: StackOverflow still after implementing custom serializers when working with large data set
Date Mon, 23 Sep 2013 19:19:08 GMT
We are using protobuf, which under the covers does have this.  I was under
the impression that the custom serializer solution would work out - it
helped some but ultimately I needed a larger stack size.


On Mon, Sep 23, 2013 at 3:11 PM, Reynold Xin <rxin@cs.berkeley.edu> wrote:

> Hi Gary,
>
> I am really confused here - what does your custom serializer do? Do you
> have some data structure that is having a giant nested structure?
>
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Tue, Sep 17, 2013 at 1:40 PM, Gary Malouf <malouf.gary@gmail.com>wrote:
>
>> We ultimately solved this by putting a huge stack size of 100m on the
>> slave nodes' spark-env.sh.  Two things deceiving about this:
>>
>> 1) That a gigantic stack is needed for deserialization
>>
>> 2) The docs seem to imply that the slave settings are determined at
>> runtime from the scheduler - this is not the case globally.
>>
>>
>> On Tue, Sep 17, 2013 at 12:38 PM, Gary Malouf <malouf.gary@gmail.com>wrote:
>>
>>> If more context is needed, I am happy to provide it.  This is a very
>>> troubling issue for us as it seriously limits how much data we can look at
>>> a time in Spark.  For now, I am able to revert to Hive to get the job done..
>>>
>>>
>>> On Fri, Sep 13, 2013 at 3:19 PM, Gary Malouf <malouf.gary@gmail.com>wrote:
>>>
>>>> I previously was having issues with StackOverflows when working with
>>>> one or two days worth of data.  Steps I have taken since then:
>>>>
>>>> 1) Increase stack size (Xss) from default to 2m to as high as 200m
>>>> 2) Active Kryo serialization
>>>> 3) Implement custom serializers for my protobuf messages
>>>>
>>>> While these changes have allowed me to grab up to 10 days worth of
>>>> data, I can not really go beyond that without the dreaded
>>>> StackOverflowError:
>>>>
>>>> java.lang.StackOverflowError
>>>>     at
>>>> java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2291)
>>>>     at
>>>> java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2584)
>>>>     at
>>>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2594)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1316)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1704)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1704)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1704)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1704)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1704)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1704)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>     at
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>>     at
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>>     at
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>>     at
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>>
>>>>
>>>> Seems like it gets stuck in an infinite loop of deserialization.  Has
>>>> anyone found ways to work through this?
>>>>
>>>
>>>
>>
>

Mime
View raw message