spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com>
Subject Re: Spark SQL - Unable to use Hive UDF because of ClassNotFoundException
Date Sat, 31 Jan 2015 00:12:55 GMT
Hi Capitão,

Since you're using CDH, your question is probably more appropriate for
the cdh-user@cloudera.org list.

The problem you're seeing is most probably an artifact of the way CDH
is currently packaged. You have to add Hive jars manually to you Spark
app's classpath if you want to use the Hive support.

For example, if you're using CDH rpm/deb packages, you'd add all jars
in /usr/lib/hive/lib/ to `spark.driver.extraClassPath` and
`spark.executor.extraClassPath`, with the added caveat of excluding
the "guava" jar from that list.


On Fri, Jan 30, 2015 at 9:17 AM, Capitão <micael.capitao@gmail.com> wrote:
> I've been trying to run HiveQL queries with UDFs in Spark SQL, but with no
> success. The problem occurs only when using functions, like the
> from_unixtime (represented by the Hive class UDFFromUnixTime).
>
> I'm using Spark 1.2 with CDH5.3.0. Running the queries in local mode work,
> but in Yarn mode don't. I'm creating an uber-jar with all the needed
> dependencies, excluding the ones provided by the cluster (Spark, Hadoop) and
> including the Hive ones. When I run the queries in Yarn I get the following
> exception:
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due
> to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure:
> Lost task 1.3 in stage 0.0 (TID 20, <REMOVED>):
> java.lang.NoClassDefFoundError: Lorg/apache/hadoop/hive/ql/exec/UDF;
>         at java.lang.Class.getDeclaredFields0(Native Method)
>         at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
>         at java.lang.Class.getDeclaredField(Class.java:1951)
>         at
> java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)
>         at java.io.ObjectStreamClass.access$700(ObjectStreamClass.java:72)
>         at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:480)
>         at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:468)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
>         at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
>         at
> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
>         at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
>         at
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>         at
> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>         at
> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>         at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
>         at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
>         at org.apache.spark.scheduler.Task.run(Task.scala:56)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.ql.exec.UDF
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>
>
> First I investigated all the jar and classpath options with the spark-submit
> command: no luck.
> Then I tried to instantiate the UDF and UDFFromUnixTime classes using the
> environment that is set up for the executor. To do that, after letting the
> spark app fail, I went to one of the container's directory
> /mnt/sda/yarn/nm/usercache/altaia/appcache/application_1422464005963_0047/container_1422464005963_0047_01_000004/
> (for example) and from there I can see these files:
>
>     container_tokens
>     GeneralTest20140929-1.0.0-SNAPSHOT.jar   <=== this is my uber-jar
>     launch_container.sh
>     __spark__.jar
>     tmp
>
> Opening the launch_container.sh I checked all the classpath setup and to
> test if that classpath was correct when launching the JVM I replaced the
> "org.apache.spark.executor.CoarseGrainedExecutorBackend" class with a class
> of mine whose job is to print the classpath and instantiate, by reflection,
> the UDF and UDFFromUnixTime and all went well.
>
> I already tested having all the dependencies' jars in one directory on all
> hosts and adding that to the spark.executor.extraClassPath and
> spark.driver.extraClassPath: no luck either.
> At this stage I just think that the uber-jar and classpath are OK. I have no
> more clues of what can be happening. Maybe some classloader issue with Spark
> SQL?
> The ClassNotFoundException occurs when returning data back to the driver
> (because of the "ResultTask" seen in the stacktrace).
>
> Does anyone had such a similar issue?
>
> Regards.
>
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Unable-to-use-Hive-UDF-because-of-ClassNotFoundException-tp21443.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message