spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yutong Luo <y...@groupon.com>
Subject Re: UDF accessing hive struct array fails with buffer underflow from kryo
Date Thu, 11 Jun 2015 21:25:58 GMT
This is a kryo issue. https://github.com/EsotericSoftware/kryo/issues/124.
It has to do with the lengths of the fieldnames. This issue is fixed in
Kryo 2.23.

What's weird is this doesn't break on Hive itself, only when using
SparkSQL. Attached is the full stacktrace. It might be how SparkSQL is
interacting with Hive that's making this break.

Breaking the aforementioned collection of structs into smaller structs, or
renaming them to be shorter is a ugly workaround.


On Thu, May 28, 2015 at 3:21 PM, yluo <yluo@groupon.com> wrote:

> Hi all, I'm using Spark 1.3.1 with Hive 0.13.1. When running a UDF
> accessing
> a hive struct array the query fails with:
>
> Caused by: com.esotericsoftware.kryo.KryoException: Buffer underflow.
> Serialization trace:
> fieldName
>
> (org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector$MyField)
> fields
>
> (org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector)
> listElementObjectInspector
> (org.apache.hadoop.hive.serde2.objectinspector.StandardListObjectInspector)
> argStructArrayOI (com.groupon.hive.udf.filter.StructStringMemberFilterUDF)
>         at com.esotericsoftware.kryo.io.Input.require(Input.java:156)
>         at
> com.esotericsoftware.kryo.io.Input.readAscii_slow(Input.java:580)
>         at com.esotericsoftware.kryo.io.Input.readAscii(Input.java:558)
>         at com.esotericsoftware.kryo.io.Input.readString(Input.java:436)
>         at
>
> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:157)
>         at
>
> com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.read(DefaultSerializers.java:146)
>         at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
>         at
>
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:109)
>         at
>
> com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:648)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
>         at
>
> com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
>         at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:626)
>         at
>
> org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:918)
>         ... 102 more
>
> Anyone seen anything similar? argStructArrayOI is a Hive
> ListObjectInspector. The field the argStructArrayOI is accessing looks
> like:
>
>
> array<struct&lt;order_by_id:bigint,subscription_id:bigint,unsubscribe_hash:string,country_id:int,optin_hash:string,city_part_id:bigint,subscription_type:string,locale:string>>
>
> The table is a hive table.
>
> Running the same query on Hive works... what's going on here? Any
> suggestions on how to debug this?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/UDF-accessing-hive-struct-array-fails-with-buffer-underflow-from-kryo-tp23078.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Thanks,
Yutong

Mime
View raw message