hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-12045) ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
Date Fri, 06 Nov 2015 14:42:27 GMT

     [ https://issues.apache.org/jira/browse/HIVE-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rui Li updated HIVE-12045:
--------------------------
    Attachment: HIVE-12045.1-spark.patch

The error occurs when we call {{Utilities.getBaseWork}} in the driver. We add the custom jars
to Kryo's classloader at this point. However, Utilities only accepts local jars. And since
the jars will be uploaded to HDFS in yarn-cluster mode, they won't be accessible to Kryo.
Since we already download the jars when the job is submitted, we should keep the local URLs
in the conf, so that Utilities will have them added for Kryo. This is what my patch does.
I can pass the example queries with it.

Something I want to explain:
1. Why it only happens with distinct clause? Without distinct, the query doesn't need a distributed
job to run.
2. Why it only happens for generic UDF? I don't know for sure, but my guess is somehow only
generic UDFs are needed during deserializatoin on driver side.

> ClassNotFound for GenericUDF in "select distinct..." query (Hive on Spark)
> --------------------------------------------------------------------------
>
>                 Key: HIVE-12045
>                 URL: https://issues.apache.org/jira/browse/HIVE-12045
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>         Environment: Cloudera QuickStart VM - CDH5.4.2
> beeline
>            Reporter: Zsolt Tóth
>            Assignee: Rui Li
>         Attachments: HIVE-12045.1-spark.patch, example.jar
>
>
> If I execute the following query in beeline, I get ClassNotFoundException for the UDF
class.
> {code}
> drop function myGenericUdf;
> create function myGenericUdf as 'org.example.myGenericUdf' using jar 'hdfs:///tmp/myudf.jar';
> select distinct myGenericUdf(1,2,1) from mytable;
> {code}
> In my example, myGenericUdf just looks for the 1st argument's value in the others and
returns the index. I don't think this is related to the actual GenericUDF function.
> Note that:
> "select myGenericUdf(1,2,1) from mytable;" succeeds
> If I use the non-generic implementation of the same UDF, the select distinct call succeeds.
> StackTrace:
> {code}
> 15/10/06 05:20:25 ERROR exec.Utilities: Failed to load plan: hdfs://quickstart.cloudera:8020/tmp/hive/hive/f9de3f09-c12d-4528-9ee6-1f12932a14ae/hive_2015-10-06_05-20-07_438_6519207588897968406-20/-mr-10003/27cd7226-3e22-46f4-bddd-fb8fd4aa4b8d/map.xml:
org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.example.myGenericUDF
> Serialization trace:
> genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> chidren (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
> colExprMap (org.apache.hadoop.hive.ql.exec.GroupByOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
> 	at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> 	at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:99)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:112)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
> 	at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
> 	at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1069)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:960)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:974)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:416)
> 	at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:296)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:268)
> 	at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:505)
> 	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
> 	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
> 	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
> 	at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
> 	at org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:80)
> 	at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:206)
> 	at org.apache.spark.rdd.RDD$$anonfun$dependencies$2.apply(RDD.scala:204)
> 	at scala.Option.getOrElse(Option.scala:120)
> 	at org.apache.spark.rdd.RDD.dependencies(RDD.scala:204)
> 	at org.apache.spark.scheduler.DAGScheduler.visit$2(DAGScheduler.scala:338)
> 	at org.apache.spark.scheduler.DAGScheduler.getAncestorShuffleDependencies(DAGScheduler.scala:355)
> 	at org.apache.spark.scheduler.DAGScheduler.registerShuffleDependencies(DAGScheduler.scala:317)
> 	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getShuffleMapStage(DAGScheduler.scala:218)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$visit$1$1.apply(DAGScheduler.scala:301)
> 	at org.apache.spark.scheduler.DAGScheduler$$anonfun$visit$1$1.apply(DAGScheduler.scala:298)
> 	at scala.collection.immutable.List.foreach(List.scala:318)
> 	at org.apache.spark.scheduler.DAGScheduler.visit$1(DAGScheduler.scala:298)
> 	at org.apache.spark.scheduler.DAGScheduler.getParentStages(DAGScheduler.scala:310)
> 	at org.apache.spark.scheduler.DAGScheduler.newStage(DAGScheduler.scala:244)
> 	at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:731)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
> 	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354)
> 	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> Caused by: java.lang.ClassNotFoundException: org.example.myGenericUDF
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> 	at java.lang.Class.forName0(Native Method)
> 	at java.lang.Class.forName(Class.java:270)
> 	at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
> 	... 72 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message