hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12175) Upgrade Kryo version to 3.0.x
Date Tue, 24 Nov 2015 19:30:10 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15025161#comment-15025161
] 

Prasanth Jayachandran commented on HIVE-12175:
----------------------------------------------

>From what I understand, class registration in kryo is optional. Registering a class means
an unique integer ID is assigned to class. If the class is not registered, then FQCN is written
out during serialization. The default serializer for these registered class is FieldSerializer
which handles the creation of objects. Default strategy is to invoke the zero arg constructor
reflectively. In case of private zero arg constructor reflection trick (setAccessible) is
used to create instance. If that fails, then it uses Objenesis's StdInstantiatorStrategy to
create object without invoking constructor. 

My understanding is that, when a user add custom jars and make use of their custom UDF, the
serialization of ExpressionNode will write out the FQCN of the user UDF. During deserialization,
as long as the UDF is in classpath (which will be localized on task nodes) then UDF instance
can be created using above mentioned strategies using the default FieldSerializer.

There are some instances, where the serializers are not what we expect like sql.Date vs util.Date
or some instances where objects cannot be created using any of the above the strategies. That's
when we explicitly register custom serializer for specific classes. If user UDFs hits any
such cases (ex: Arrays.asLists()) and we haven't provided custom serializer then we are in
trouble. 

> Upgrade Kryo version to 3.0.x
> -----------------------------
>
>                 Key: HIVE-12175
>                 URL: https://issues.apache.org/jira/browse/HIVE-12175
>             Project: Hive
>          Issue Type: Improvement
>          Components: Serializers/Deserializers
>    Affects Versions: 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>             Fix For: 2.0.0
>
>         Attachments: HIVE-12175.1.patch, HIVE-12175.2.patch, HIVE-12175.3.patch, HIVE-12175.3.patch,
HIVE-12175.4.patch, HIVE-12175.5.patch, HIVE-12175.6.patch
>
>
> Current version of kryo (2.22) has some issue (refer exception below and in HIVE-12174)
with serializing ArrayLists generated using Arrays.asList(). We need to either replace all
occurrences of  Arrays.asList() or change the current StdInstantiatorStrategy. This issue
is fixed in later versions and kryo community recommends using DefaultInstantiatorStrategy
with fallback to StdInstantiatorStrategy. More discussion about this issue is here https://github.com/EsotericSoftware/kryo/issues/216.
Alternatively, custom serilization/deserilization class can be provided for Arrays.asList.
> Also, kryo 3.0 introduced unsafe based serialization which claims to have much better
performance for certain types of serialization. 
> Exception:
> {code}
> Caused by: java.lang.NullPointerException
> 	at java.util.Arrays$ArrayList.size(Arrays.java:2847)
> 	at java.util.AbstractList.add(AbstractList.java:108)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> 	... 57 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message