spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Odersky <ja...@odersky.com>
Subject Re: ClassNotFoundException in RDD.map
Date Thu, 17 Mar 2016 21:14:25 GMT
The error is very strange indeed, however without code that reproduces
it, we can't really provide much help beyond speculation.

One thing that stood out to me immediately is that you say you have an
RDD of Any where every Any should be a BigDecimal, so why not specify
that type information?
When using Any, a whole class of errors, that normally the typechecker
could catch, can slip through.

On Thu, Mar 17, 2016 at 10:25 AM, Dirceu Semighini Filho
<dirceu.semighini@gmail.com> wrote:
> Hi Ted, thanks for answering.
> The map is just that, whenever I try inside the map it throws this
> ClassNotFoundException, even if I do map(f => f) it throws the exception.
> What is bothering me is that when I do a take or a first it returns the
> result, which make me conclude that the previous code isn't wrong.
>
> Kind Regards,
> Dirceu
>
>
> 2016-03-17 12:50 GMT-03:00 Ted Yu <yuzhihong@gmail.com>:
>>
>> bq. $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>>
>> Do you mind showing more of your code involving the map() ?
>>
>> On Thu, Mar 17, 2016 at 8:32 AM, Dirceu Semighini Filho
>> <dirceu.semighini@gmail.com> wrote:
>>>
>>> Hello,
>>> I found a strange behavior after executing a prediction with MLIB.
>>> My code return an RDD[(Any,Double)] where Any is the id of my dataset,
>>> which is BigDecimal, and Double is the prediction for that line.
>>> When I run
>>> myRdd.take(10) it returns ok
>>> res16: Array[_ >: (Double, Double) <: (Any, Double)] =
>>> Array((1921821857196754403.000000000000000000,0.1690292052496703),
>>> (4545756323742222427.000000000000000000,0.16902820241892452),
>>> (989198096568001939.000000000000000000,0.16903432789699502),
>>> (14284129652106187990.000000000000000000,0.16903517653451386),
>>> (17980228074225252497.000000000000000000,0.16903151028332508),
>>> (3861345958263692781.000000000000000000,0.16903056986183976),
>>> (17558198701997383205.000000000000000000,0.1690295450319745),
>>> (10651576092054552310.000000000000000000,0.1690286445174418),
>>> (4534494349035056215.000000000000000000,0.16903303401862327),
>>> (5551671513234217935.000000000000000000,0.16902303368995966))
>>> But when I try to run some map on it:
>>> myRdd.map(_._1).take(10)
>>> It throws a ClassCastException:
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
>>> in stage 72.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>>> 72.0 (TID 1774, 172.31.23.208): java.lang.ClassNotFoundException:
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:278)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>> at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Driver stacktrace:
>>> at
>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
>>> at
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
>>> at scala.Option.foreach(Option.scala:236)
>>> at
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
>>> at
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
>>> at
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
>>> at
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)
>>> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>>> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1837)
>>> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1850)
>>> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1302)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
>>> at org.apache.spark.rdd.RDD.take(RDD.scala:1276)
>>> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1316)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1315)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:39)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
>>> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
>>> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
>>> at $iwC$$iwC$$iwC.<init>(<console>:47)
>>> at $iwC$$iwC.<init>(<console>:49)
>>> at $iwC.<init>(<console>:51)
>>> at <init>(<console>:53)
>>> at .<init>(<console>:57)
>>> at .<clinit>(<console>)
>>> at .<init>(<console>:7)
>>> at .<clinit>(<console>)
>>> at $print(<console>)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:497)
>>> at
>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>> at
>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
>>> at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>> at
>>> org.apache.zeppelin.spark.IUberSparkInterpreter.interpretInput(IUberSparkInterpreter.java:865)
>>> at
>>> org.apache.zeppelin.spark.IUberSparkInterpreter.interpret(IUberSparkInterpreter.java:830)
>>> at
>>> org.apache.zeppelin.spark.IUberSparkInterpreter.interpret(IUberSparkInterpreter.java:818)
>>> at
>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>> at
>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>> at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:331)
>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:171)
>>> at
>>> org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.ClassNotFoundException:
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> at java.security.AccessController.doPrivileged(Native Method)
>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>> at java.lang.Class.forName0(Native Method)
>>> at java.lang.Class.forName(Class.java:278)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
>>> at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>>> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
>>> at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>> at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>> at org.apache.spark.scheduler.Task.run(Task.scala:88)
>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> ... 1 more
>>>
>>> I'm using spark 1.5.2 compiled with hadoop 2.6.
>>> Does anybody know how to avoid this exception?
>>>
>>> Kind Regards,
>>> Dirceu
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message