spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: java serialization errors with spark.files.userClassPathFirst=true
Date Fri, 16 May 2014 19:54:12 GMT
ok i think the issue is visibility: a classloader can see all classes
loaded by its parent classloader. but userClassLoader does not have a
parent classloader, so its not able to "see" any classes that parentLoader
is responsible for. in my case userClassLoader is trying to get
AvroInputFormat which probably somewhere statically references
FileInputFormat, which is invisible to userClassLoader.


On Fri, May 16, 2014 at 3:32 PM, Koert Kuipers <koert@tresata.com> wrote:

> ok i put lots of logging statements in the ChildExecutorURLClassLoader.
> this is what i see:
>
> * the urls for userClassLoader are correct and includes only my one jar.
>
> * for one class that only exists in my jar i see it gets loaded correctly
> using userClassLoader
>
> * for a class that exists in both my jar and spark kernel it tries to use
> userClassLoader and ends up with a NoClassDefFoundError. the class is
> org.apache.avro.mapred.AvroInputFormat and the NoClassDefFoundError is for
> org.apache.hadoop.mapred.FileInputFormat (which the parentClassLoader is
> responsible for since it is not in my jar). i currently catch this
> NoClassDefFoundError and call parentClassLoader.loadClass but thats clearly
> not a solution since it loads the wrong version.
>
>
>
> On Fri, May 16, 2014 at 2:25 PM, Koert Kuipers <koert@tresata.com> wrote:
>
>> well, i modified ChildExecutorURLClassLoader to also delegate to
>> parentClassloader if NoClassDefFoundError is thrown... now i get yet
>> another error. i am clearly missing something with these classloaders. such
>> nasty stuff... giving up for now. just going to have to not use
>> spark.files.userClassPathFirst=true for now, until i have more time to look
>> at this.
>>
>> 14/05/16 13:58:59 ERROR Executor: Exception in task ID 3
>> java.lang.ClassCastException: cannot assign instance of scala.None$ to
>> field org.apache.spark.rdd.RDD.checkpointData of type scala.Option in
>> instance of MyRDD
>>         at
>> java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
>>         at
>> java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1995)
>>
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>         at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>         at
>> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>         at
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>         at
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>         at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>         at
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>         at
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>         at
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>>
>>
>>
>> On Fri, May 16, 2014 at 1:46 PM, Koert Kuipers <koert@tresata.com> wrote:
>>
>>> after removing all class paramater of class Path from my code, i tried
>>> again. different but related eror when i set
>>> spark.files.userClassPathFirst=true
>>>
>>> now i dont even use FileInputFormat directly. HadoopRDD does...
>>>
>>> 14/05/16 12:17:17 ERROR Executor: Exception in task ID 45
>>> java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/FileInputFormat
>>>         at java.lang.ClassLoader.defineClass1(Native Method)
>>>         at java.lang.ClassLoader.defineClass(ClassLoader.java:792)
>>>         at
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>>>         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>         at
>>> org.apache.spark.executor.ChildExecutorURLClassLoader$userClassLoader$.findClass(ExecutorURLClassLoader.scala:42)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>         at
>>> org.apache.spark.executor.ChildExecutorURLClassLoader.findClass(ExecutorURLClassLoader.scala:51)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>         at java.lang.Class.forName0(Native Method)
>>>         at java.lang.Class.forName(Class.java:270)
>>>         at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:57)
>>>         at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
>>>         at
>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
>>>         at
>>> java.io.ObjectInputStream.readClass(ObjectInputStream.java:1481)
>>>         at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1331)
>>>         at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>         at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>         at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>         at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>         at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
>>>         at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
>>>         at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>         at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>         at
>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>         at
>>> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>         at
>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>>         at
>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
>>>         at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>         at
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
>>>
>>>
>>>
>>> On Thu, May 15, 2014 at 3:03 PM, Koert Kuipers <koert@tresata.com>wrote:
>>>
>>>> when i set spark.files.userClassPathFirst=true, i get java
>>>> serialization errors in my tasks, see below. when i set userClassPathFirst
>>>> back to its default of false, the serialization errors are gone. my
>>>> spark.serializer is KryoSerializer.
>>>>
>>>> the class org.apache.hadoop.fs.Path is in the spark assembly jar, but
>>>> not in my task jars (the ones i added to the SparkConf). so looks like the
>>>> ClosureSerializer is having trouble with this class once the
>>>> ChildExecutorURLClassLoader is used? thats me just guessing.
>>>>
>>>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>>>> due to stage failure: Task 1.0:5 failed 4 times, most recent failure:
>>>> Exception failure in TID 31 on host node05.tresata.com:
>>>> java.lang.NoClassDefFoundError: org/apache/hadoop/fs/Path
>>>>         java.lang.Class.getDeclaredConstructors0(Native Method)
>>>>         java.lang.Class.privateGetDeclaredConstructors(Class.java:2398)
>>>>         java.lang.Class.getDeclaredConstructors(Class.java:1838)
>>>>
>>>> java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1697)
>>>>         java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:50)
>>>>         java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:203)
>>>>         java.security.AccessController.doPrivileged(Native Method)
>>>>
>>>> java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:200)
>>>>
>>>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:556)
>>>>
>>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1580)
>>>>
>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1493)
>>>>
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1729)
>>>>
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
>>>>
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
>>>>
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>>
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>>
>>>> scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>>>         sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>         java.lang.reflect.Method.invoke(Method.java:597)
>>>>
>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
>>>>
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852)
>>>>
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>>
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>
>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
>>>>
>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
>>>>
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>>
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>>
>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>>>>
>>>> org.apache.spark.scheduler.ShuffleMapTask$.deserializeInfo(ShuffleMapTask.scala:66)
>>>>
>>>> org.apache.spark.scheduler.ShuffleMapTask.readExternal(ShuffleMapTask.scala:139)
>>>>
>>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795)
>>>>
>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754)
>>>>
>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>>
>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:60)
>>>>
>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:82)
>>>>
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:190)
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>>         java.lang.Thread.run(Thread.java:662)
>>>>
>>>>
>>>
>>
>

Mime
View raw message