spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: ClasssNotFoundExeception was thrown while trying to save rdd
Date Mon, 13 Oct 2014 06:53:32 GMT
Adding your application jar to the sparkContext will resolve this issue.

Eg:
sparkContext.addJar("./target/scala-2.10/myTestApp_2.10-1.0.jar")

Thanks
Best Regards

On Mon, Oct 13, 2014 at 8:42 AM, Tao Xiao <xiaotao.cs.nju@gmail.com> wrote:

> In the beginning I tried to read HBase and found that exception was
> thrown, then I start to debug the app. I removed the codes reading HBase
> and tried to save an rdd containing a list and the exception was still
> thrown. So I'm sure that exception was not caused by reading HBase.
>
> While debugging I did not change the object name and file name.
>
>
>
> 2014-10-13 0:00 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
>
>> Your app is named scala.HBaseApp
>> Does it read / write to HBase ?
>>
>> Just curious.
>>
>> On Sun, Oct 12, 2014 at 8:00 AM, Tao Xiao <xiaotao.cs.nju@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I'm using CDH 5.0.1 (Spark 0.9)  and submitting a job in Spark
>>> Standalone Cluster mode.
>>>
>>> The job is quite simple as follows:
>>>
>>>   object HBaseApp {
>>>     def main(args:Array[String]) {
>>>         testHBase("student", "/test/xt/saveRDD")
>>>     }
>>>
>>>
>>>     def testHBase(tableName: String, outFile:String) {
>>>       val sparkConf = new SparkConf()
>>>             .setAppName("-- Test HBase --")
>>>             .set("spark.executor.memory", "2g")
>>>             .set("spark.cores.max", "16")
>>>
>>>       val sparkContext = new SparkContext(sparkConf)
>>>
>>>       val rdd = sparkContext.parallelize(List(1,2,3,4,5,6,7,8,9,10), 3)
>>>
>>>       val c = rdd.count     // successful
>>>       println("\n\n\n"  + c + "\n\n\n")
>>>
>>>       rdd.saveAsTextFile(outFile)  // This line will throw
>>> "java.lang.ClassNotFoundException:
>>> com.xt.scala.HBaseApp$$anonfun$testHBase$1"
>>>
>>>       println("\n  down  \n")
>>>     }
>>> }
>>>
>>> I submitted this job using the following script:
>>>
>>> #!/bin/bash
>>>
>>> HBASE_CLASSPATH=$(hbase classpath)
>>> APP_JAR=/usr/games/spark/xt/SparkDemo-0.0.1-SNAPSHOT.jar
>>>
>>> SPARK_ASSEMBLY_JAR=/usr/games/spark/xt/spark-assembly_2.10-0.9.0-cdh5.0.1-hadoop2.3.0-cdh5.0.1.jar
>>> SPARK_MASTER=spark://b02.jsepc.com:7077
>>>
>>> CLASSPATH=$CLASSPATH:$APP_JAR:$SPARK_ASSEMBLY_JAR:$HBASE_CLASSPATH
>>> export SPARK_CLASSPATH=/usr/lib/hbase/lib/*
>>>
>>> CONFIG_OPTS="-Dspark.master=$SPARK_MASTER"
>>>
>>> java -cp $CLASSPATH $CONFIG_OPTS com.xt.scala.HBaseApp $@
>>>
>>> After I submitted the job, the count of rdd could be computed
>>> successfully, but that rdd could not be saved into HDFS and the following
>>> exception was thrown:
>>>
>>> 14/10/11 16:09:33 WARN scheduler.TaskSetManager: Loss was due to
>>> java.lang.ClassNotFoundException
>>> java.lang.ClassNotFoundException:
>>> com.xt.scala.HBaseApp$$anonfun$testHBase$1
>>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>  at java.lang.Class.forName0(Native Method)
>>>  at java.lang.Class.forName(Class.java:270)
>>>  at
>>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
>>>  at
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>>>  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>>>  at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>  at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>  at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>  at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>  at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>  at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>  at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>  at java.lang.reflect.Method.invoke(Method.java:606)
>>>  at
>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>>>  at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>  at
>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>>>  at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>  at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>>>  at
>>> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
>>>  at
>>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
>>>  at
>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
>>>  at
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>>>  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>  at
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>>>  at
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
>>>  at
>>> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
>>>  at
>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
>>>  at
>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>>  at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>>>  at
>>> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
>>>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>  at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>  at java.lang.Thread.run(Thread.java:744)
>>>
>>>
>>>
>>> I also noted that, if I add "-Dspark.jars=$APP_JAR" to the variable
>>> *CONFIG_OPTS*, i.e., CONFIG_OPTS="-Dspark.master=$SPARK_MASTER
>>> Dspark.jars=$APP_JAR", the job will finish successfully and rdd can be
>>> written into HDFS.
>>> So, what does "java.lang.ClassNotFoundException:
>>> com.xt.scala.HBaseApp$$anonfun$testHBase$1" mean and why would it be
>>> thrown ?
>>>
>>> Thanks
>>>
>>>
>>
>

Mime
View raw message