spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominik Safaric (JIRA)" <>
Subject [jira] [Created] (SPARK-19785) java.lang.ClassNotFoundException - Scala anonymous function
Date Wed, 01 Mar 2017 12:29:45 GMT
Dominik Safaric created SPARK-19785:

             Summary: java.lang.ClassNotFoundException - Scala anonymous function
                 Key: SPARK-19785
             Project: Spark
          Issue Type: Question
          Components: Deploy, Spark Core, Spark Submit
    Affects Versions: 2.1.0
         Environment: Ubuntu 16.04.1 LTS
            Reporter: Dominik Safaric

I've been trying to submit a Spark Streaming application using spark-submit to a cluster of
mine consisting of a master and two worker nodes. The application has been written in Scala,
and build using Maven. Importantly, the Maven build is configured to produce a fat JAR containing
all dependencies. Furthermore, the JAR has been distributed to all of nodes. The streaming
job has been submitted using the following command: 

bin/spark-submit --class topology.SimpleProcessingTopology --jars /tmp/spark_streaming-1.0-SNAPSHOT.jar
--master spark:// --verbose /tmp/spark_streaming-1.0-SNAPSHOT.jar /tmp/

where is the IP address of the master node within the VNET. 

However, I keep getting the following exception while starting the streaming application:

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)

Caused by: java.lang.ClassNotFoundException: topology.SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1
	at java.lang.ClassLoader.loadClass(
	at java.lang.ClassLoader.loadClass(
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(
	at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)

I've checked the content of the JAR using jar tvf and as you can see in the output below,
it does contain the class in question.

  1735 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$$anonfun$main$1.class
   702 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology.class
  2415 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.class
  2500 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$$anonfun$main$1$$anonfun$apply$1.class
  7045 Wed Mar 01 12:29:20 UTC 2017 topology/SimpleProcessingTopology$.class

This exception has been caused due to the anonymous function of the foreachPartition call:

rdd.foreachPartition(partition => {
        val outTopic = props.getString("application.simple.kafka.out.topic")
        val producer = new KafkaProducer[Array[Byte],Array[Byte]](kafkaParams)
        partition.foreach(record => {
          val producerRecord = new ProducerRecord[Array[Byte], Array[Byte]](outTopic, record.key(),

Unfortunately, I am not able to find the root cause of this since so far. Hence, I would appreciate
if anyone could help me out fixing this issue. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message