spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)
Date Wed, 21 May 2014 14:25:40 GMT
Hi Tobias,

For your simple example, I just used sbt package, but for more complex jobs
that have external dependencies, either:
 - you should use sbt assembly [1] or mvn shade plugin [2] to build a "fat
jar" (aka jar-with-dependencies)
 - or  provide a list of jars including your job jar along with any
additional dependency: (e.g.
.setJars(Seq("sparkexample_2.10-0.1.jar","dependency1.jar",
"dependency2.jar", ... ,"dependencyN.jar"))

-kr, Gerard.

[1] https://github.com/sbt/sbt-assembly
[2] http://maven.apache.org/plugins/maven-shade-plugin


On Wed, May 21, 2014 at 4:04 PM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

> Gerard,
>
> thanks very much for your investigation! After hours of trial and
> error, I am kind of happy to hear it is not just a broken setup on my
> side that's causing the error.
>
> Could you explain briefly how you created that simple jar file?
>
> Thanks,
> Tobias
>
> On Wed, May 21, 2014 at 9:47 PM, Gerard Maas <gerard.maas@gmail.com>
> wrote:
> > Hi Tobias,
> >
> > I was curious about this issue and tried to run your example on my local
> > Mesos. I was able to reproduce your issue using your current config:
> >
> > [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
> > 1.0:4 failed 4 times (most recent failure: Exception failure:
> > java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2)
> > org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times
> > (most recent failure: Exception failure:
> java.lang.ClassNotFoundException:
> > spark.SparkExamplesMinimal$$anonfun$2)
> > at
> >
> org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
> >
> > Creating a simple jar from the job and providing it through the
> > configuration seems to solve it:
> >
> > val conf = new SparkConf()
> >       .setMaster("mesos://<my_ip>:5050/")
> >
> >
> .setJars(Seq("/sparkexample/target/scala-2.10/sparkexample_2.10-0.1.jar"))
> >       .setAppName("SparkExamplesMinimal")
> >
> > Resulting in:
> > 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1)
> > 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Stage 1 (count at
> > SparkExamplesMinimal.scala:50) finished in 1.120 s
> > 14/05/21 12:03:45 INFO spark.SparkContext: Job finished: count at
> > SparkExamplesMinimal.scala:50, took 1.177091435 s
> > count: 1000000
> >
> > Why the closure serialization does not work with Mesos is beyond my
> current
> > knowledge.
> > Would be great to hear from the experts (cross-posting to dev for that)
> >
> > -kr, Gerard.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, May 21, 2014 at 11:51 AM, Tobias Pfeiffer <tgp@preferred.jp>
> wrote:
> >>
> >> Hi,
> >>
> >> I have set up a cluster with Mesos (backed by Zookeeper) with three
> >> master and three slave instances. I set up Spark (git HEAD) for use
> >> with Mesos according to this manual:
> >> http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html
> >>
> >> Using the spark-shell, I can connect to this cluster and do simple RDD
> >> operations, but the same code in a Scala class and executed via sbt
> >> run-main works only partially. (That is, count() works, count() after
> >> flatMap() does not.)
> >>
> >> Here is my code:
> https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91
> >> The file SparkExamplesScript.scala, when pasted into spark-shell,
> >> outputs the correct count() for the parallelized list comprehension,
> >> as well as for the flatMapped RDD.
> >>
> >> The file SparkExamplesMinimal.scala contains exactly the same code,
> >> and also the MASTER configuration and the Spark Executor are the same.
> >> However, while the count() for the parallelized list is displayed
> >> correctly, I receive the following error when asking for the count()
> >> of the flatMapped RDD:
> >>
> >> -----------------
> >>
> >> 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1
> >> (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which
> >> has no missing parents
> >> 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing
> >> tasks from Stage 1 (FlatMappedRDD[1] at flatMap at
> >> SparkExamplesMinimal.scala:34)
> >> 14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set
> >> 1.0 with 8 tasks
> >> 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0
> >> as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1
> >> (PROCESS_LOCAL)
> >> 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0
> >> as 1779147 bytes in 37 ms
> >> 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0)
> >> 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to
> >> java.lang.ClassNotFoundException
> >> java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2
> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> >> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> >> at java.security.AccessController.doPrivileged(Native Method)
> >> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> >> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> >> at java.lang.Class.forName0(Native Method)
> >> at java.lang.Class.forName(Class.java:270)
> >> at
> >>
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
> >> at
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
> >> at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
> >> at
> >>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >> at
> >> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
> >> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
> >> at
> >>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> >> at
> >>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> >> at
> >>
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
> >> at
> >> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
> >> at
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
> >> at
> >>
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> >> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> >> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> >> at
> >>
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
> >> at
> >>
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
> >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> at java.lang.Thread.run(Thread.java:745)
> >>
> >> -----------------
> >>
> >> Can anyone explain to me where this comes from or how I might further
> >> track the problem down?
> >>
> >> Thanks,
> >> Tobias
> >
> >
>

Mime
View raw message