spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Morozov <fathers...@list.ru>
Subject Re: How to distribute non-serializable object in transform task or broadcast ?
Date Fri, 07 Aug 2015 17:44:37 GMT
Would like to add smth, inlined.

On 07 Aug 2015, at 18:51, Eugene Morozov <fathersson@list.ru> wrote:

> Hao, 
> 
> I’d say there are few possible ways to achieve that:
> 1. Use KryoSerializer.
>   The flaw of KryoSerializer is that current version (2.21) has an issue with internal
state and it might not work for some objects. Spark get kryo dependency as transitive through
chill and it’ll not be resolved quickly. Kryo doesn’t work for me (I have such an classes
I have to transfer, but do not have their codebase).
> 
> 2. Wrap it into something you have control and make that something serializable.
>   The flaw is kind of obvious - it’s really hard to write serialization for complex
objects.
> 
> 3. Tricky algo: don’t do anything that might end up as reshuffle.
>   That’s the way I took. The flow is that we have CSV file as input, parse it and create
objects that we cannot serialize / deserialize, thus cannot transfer over the network. Currently
we’ve workarounded it so that these objects processed only in those partitions where thye’ve
been born. 

That means it’s not possible in debug to call collect() on our RDD (even though spark master
is local), but there is always a way to get to know that’s inside.
The algo is pretty complex, we still do reshuffle and everything, but before we finally create
those objects.

> 
> Hope, this helps.
> 
> On 07 Aug 2015, at 12:39, Hao Ren <invkrh@gmail.com> wrote:
> 
>> Is there any workaround to distribute non-serializable object for RDD transformation
or broadcast variable ?
>> 
>> Say I have an object of class C which is not serializable. Class C is in a jar package,
I have no control on it. Now I need to distribute it either by rdd transformation or by broadcast.

>> 
>> I tried to subclass the class C with Serializable interface. It works for serialization,
but deserialization does not work, since there are no parameter-less constructor for the class
C and deserialization is broken with an invalid constructor exception.
>> 
>> I think it's a common use case. Any help is appreciated.
>> 
>> -- 
>> Hao Ren
>> 
>> Data Engineer @ leboncoin
>> 
>> Paris, France
> 
> Eugene Morozov
> fathersson@list.ru
> 
> 
> 
> 

Eugene Morozov
fathersson@list.ru





Mime
View raw message