spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: How to access global kryo instance?
Date Tue, 07 Jan 2014 03:04:15 GMT
I see -- the answer is no, we do currently not use an object pool, but
instead just try to create it less frequently (typically one
SerializerInstance per partition). For instance, you could do

rdd.mapPartitions { partitionIterator =>
  val kryo = SparkEnv.get.serializer.newKryo()
  partitionIterator.map(row => doWorkWithKryo(kryo, row))
}

This should amortize the cost greatly. The only requirement of an instance
is that it not be used by multiple threads simultaneously, and this fits
that requirement perfectly.


On Mon, Jan 6, 2014 at 6:59 PM, Aureliano Buendia <buendia360@gmail.com>wrote:

>
>
>
> On Tue, Jan 7, 2014 at 2:52 AM, Aaron Davidson <ilikerps@gmail.com> wrote:
>
>> Please take a look at the source code -- it's relatively friendly, and
>> very useful for digging into Spark internals! (KryoSerializer<https://github.com/apache/incubator-spark/blob/master/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala>
>> )
>>
>> As you can see, a Kryo instance is available via ser.newKryo(). You can
>> also use Spark's SerializerInstance interface (which features serialize()
>> and deserialize() methods) by simply calling ser.newInstance().
>>
>
> Sorry, maybe I wasn't clear. What I meant was, does spark use a singleton
> instance of kryo that can be accessed inside the map closure?
>
> Keep calling ser.newKryo() for every element (inside a map closure) has a
> huge overhead, and it seems newKryo() doesn't use any caching. Twitter
> chill uses an object pool for kryo instances, I'm not sure how spark
> handles this.
>
>
>>
>>
>> On Mon, Jan 6, 2014 at 5:20 PM, Aureliano Buendia <buendia360@gmail.com>wrote:
>>
>>> In a map closure, I could use:
>>>
>>> val ser = SparkEnv.get.serializer.asInstanceOf[KryoSerializer]
>>>
>>> But how to get the instance of Kryo that spark uses from ser?
>>>
>>>
>>> On Tue, Jan 7, 2014 at 1:04 AM, Aaron Davidson <ilikerps@gmail.com>wrote:
>>>
>>>> I believe SparkEnv.get.serializer would return the serializer created
>>>> from the "spark.serializer" property.
>>>>
>>>> You can also obtain a Kryo serializer directly via it's no-arg
>>>> constructor (it still invokes your spark.kryo.registrator):
>>>> val serializer = new KryoSerializer()
>>>> but this could have some overhead, and so should probably not be done
>>>> for every element you process.
>>>>
>>>>
>>>> On Mon, Jan 6, 2014 at 4:36 PM, Aureliano Buendia <buendia360@gmail.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there a way to access the global kryo instance created by spark?
>>>>> I'm referring to the one which is passed to registerClasses() in a
>>>>> KryoRegistrator sub class.
>>>>>
>>>>> I'd like to access this kryo instance inside a map closure, so it
>>>>> should be accessible from thw workers side too.
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message