spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: java serialization errors with spark.files.userClassPathFirst=true
Date Fri, 16 May 2014 19:32:39 GMT
ok i put lots of logging statements in the ChildExecutorURLClassLoader.
this is what i see:

* the urls for userClassLoader are correct and includes only my one jar.

* for one class that only exists in my jar i see it gets loaded correctly
using userClassLoader

* for a class that exists in both my jar and spark kernel it tries to use
userClassLoader and ends up with a NoClassDefFoundError. the class is
org.apache.avro.mapred.AvroInputFormat and the NoClassDefFoundError is for
org.apache.hadoop.mapred.FileInputFormat (which the parentClassLoader is
responsible for since it is not in my jar). i currently catch this
NoClassDefFoundError and call parentClassLoader.loadClass but thats clearly
not a solution since it loads the wrong version.



On Fri, May 16, 2014 at 2:25 PM, Koert Kuipers <koert@tresata.com> wrote:

> well, i modified ChildExecutorURLClassLoader to also delegate to
> parentClassloader if NoClassDefFoundError is thrown... now i get yet
> another error. i am clearly missing something with these classloaders. such
> nasty stuff... giving up for now. just going to have to not use
> spark.files.userClassPathFirst=true for now, until i have more time to look
> at this.
>
> 14/05/16 13:58:59 ERROR Executor: Exception in task ID 3
> java.lang.ClassCastException: cannot assign instance of scala.None$ to
> field org.apache.spark.rdd.RDD.checkpointData of type scala.Option in
> instance of MyRDD
>         at
> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
>         at
> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1995)
>
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>         at
> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>         at
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>         at
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>         at
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>         at
> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>         at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>         at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>
>
>
> On Fri, May 16, 2014 at 1:46 PM, Koert Kuipers <koert@tresata.com> wrote:
>
>> after removing all class paramater of class Path from my code, i tried
>> again. different but related eror when i set
>> spark.files.userClassPathFirst=true
>>
>> now i dont even use FileInputFormat directly. HadoopRDD does...
>>
>> 14/05/16 12:17:17 ERROR Executor: Exception in task ID 45
>> java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileInputFormat
>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
>>         at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>>         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>         at
>> org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>         at
>> org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:51)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>         at java.lang.Class.forName0(Native Method)
>>         at java.lang.Class.forName(Class.java:270)
>>         at
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:57)
>>         at
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
>>         at
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
>>         at
>> java.io.ObjectInputStream.readClass(ObjectInputStream.java:1481)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1331)
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>         at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>         at
>> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>
>>
>>
>> On Thu, May 15, 2014 at 3:03 PM, Koert Kuipers <koert@tresata.com> wrote:
>>
>>> when i set spark.files.userClassPathFirst=true, i get java serialization
>>> errors in my tasks, see below. when i set userClassPathFirst back to its
>>> default of false, the serialization errors are gone. my spark.serializer is
>>> KryoSerializer.
>>>
>>> the class org.apache.hadoop.fs.Path is in the spark assembly jar, but
>>> not in my task jars (the ones i added to the SparkConf). so looks like the
>>> ClosureSerializer is having trouble with this class once the
>>> ChildExecutorURLClassLoader is used? thats me just guessing.
>>>
>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>> due to stage failure: Task 1.0:5 failed 4 times, most recent failure:
>>> Exception failure in TID 31 on host node05.tresata.com:
>>> java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
>>>         java.lang.Class.getDeclaredConstructors0(Native Method)
>>>         java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
>>>         java.lang.Class.getDeclaredConstructors(Class.java:1838)
>>>
>>> java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1697)
>>>         java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:50)
>>>         java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:203)
>>>         java.security.AccessController.doPrivileged(Native Method)
>>>
>>> java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:200)
>>>
>>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:556)
>>>
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1580)
>>>
>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493)
>>>
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1729)
>>>
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
>>>
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
>>>
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>
>>> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>>         sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>         java.lang.reflect.Method.invoke(Method.java:597)
>>>
>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
>>>
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852)
>>>
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
>>>
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
>>>
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>>>
>>> org.apache.spark.scheduler.ShuffleMapTask$.deserializeInfo(ShuffleMapTask.scala:66)
>>>
>>> org.apache.spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:139)
>>>
>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795)
>>>
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754)
>>>
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>>>
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:82)
>>>
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:190)
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>         java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>
>

Mime
View raw message