spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Strange problem with saveAsTextFile after upgrade Spark 0.9.0->1.0.0
Date Sun, 08 Jun 2014 20:02:07 GMT
Paul,

Could you give the version of Java that you are building with and the
version of Java you are running with? Are they the same?

Just off the cuff, I wonder if this is related to:
https://issues.apache.org/jira/browse/SPARK-1520

If it is, it could appear that certain functions are not in the jar
because they go beyond the extended zip boundary `jar tvf` won't list
them.

- Patrick

On Sun, Jun 8, 2014 at 12:45 PM, Paul Brown <prb@mult.ifario.us> wrote:
> Moving over to the dev list, as this isn't a user-scope issue.
>
> I just ran into this issue with the missing saveAsTestFile, and here's a
> little additional information:
>
> - Code ported from 0.9.1 up to 1.0.0; works with local[n] in both cases.
> - Driver built as an uberjar via Maven.
> - Deployed to smallish EC2 cluster in standalone mode (S3 storage) with
> Spark 1.0.0-hadoop1 downloaded from Apache.
>
> Given that it functions correctly in local mode but not in a standalone
> cluster, this suggests to me that the issue is in a difference between the
> Maven version and the hadoop1 version.
>
> In the spirit of taking the computer at its word, we can just have a look
> in the JAR files.  Here's what's in the Maven dep as of 1.0.0:
>
> jar tvf
> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
> | grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>
>
> And here's what's in the hadoop1 distribution:
>
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
>
>
> I.e., it's not there.  It is in the hadoop2 distribution:
>
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
>
>
> So something's clearly broken with the way that the distribution assemblies
> are created.
>
> FWIW and IMHO, the "right" way to publish the hadoop1 and hadoop2 flavors
> of Spark to Maven Central would be as *entirely different* artifacts
> (spark-core-h1, spark-core-h2).
>
> Logged as SPARK-2075 <https://issues.apache.org/jira/browse/SPARK-2075>.
>
> Cheers.
> -- Paul
>
>
>
> --
> prb@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
> On Fri, Jun 6, 2014 at 2:45 AM, HenriV <henri.vanhove@vdab.be> wrote:
>
>> I'm experiencing the same error while upgrading from 0.9.1 to 1.0.0.
>> Im using google compute engine and cloud storage. but saveAsTextFile is
>> returning errors while saving in the cloud or saving local. When i start a
>> job in the cluster i do get an error but after this error it keeps on
>> running fine untill the saveAsTextFile. ( I don't know if the two are
>> connected)
>>
>> -----------Error at job startup-------
>>  ERROR metrics.MetricsSystem: Sink class
>> org.apache.spark.metrics.sink.MetricsServlet cannot be instantialized
>> java.lang.reflect.InvocationTargetException
>>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>> Method)
>>         at
>>
>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>         at
>>
>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>         at
>>
>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:136)
>>         at
>>
>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:130)
>>         at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>         at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>         at
>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>>         at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>>         at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>>         at
>>
>> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:130)
>>         at
>> org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:84)
>>         at
>>
>> org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:167)
>>         at org.apache.spark.SparkEnv$.create(SparkEnv.scala:230)
>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)
>>         at Hello$.main(Hello.scala:101)
>>         at Hello.main(Hello.scala)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>         at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at sbt.Run.invokeMain(Run.scala:72)
>>         at sbt.Run.run0(Run.scala:65)
>>         at sbt.Run.sbt$Run$$execute$1(Run.scala:54)
>>         at sbt.Run$$anonfun$run$1.apply$mcV$sp(Run.scala:58)
>>         at sbt.Run$$anonfun$run$1.apply(Run.scala:58)
>>         at sbt.Run$$anonfun$run$1.apply(Run.scala:58)
>>         at sbt.Logger$$anon$4.apply(Logger.scala:90)
>>         at sbt.TrapExit$App.run(TrapExit.scala:244)
>>         at java.lang.Thread.run(Thread.java:744)
>> Caused by: java.lang.NoSuchMethodError:
>> com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
>>         at
>> com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:445)
>>         at
>> com.fasterxml.jackson.databind.ObjectMapper.<init>(ObjectMapper.java:366)
>>         at
>>
>> org.apache.spark.metrics.sink.MetricsServlet.<init>(MetricsServlet.scala:45)
>>         ... 31 more
>>
>> then it runs fine till i get to saveAsTextFile
>>
>> 14/06/06 09:05:12 INFO scheduler.TaskSetManager: Loss was due to
>> java.lang.ClassNotFoundException:
>> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1 [duplicate 17]
>> 14/06/06 09:05:12 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile
>> at Hello.scala:123
>> 14/06/06 09:05:12 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
>> [error] (run-main-0) org.apache.spark.SparkException: Job aborted due to
>> stage failure: Task 0.0:3 failed 4 times, most recent failure: Exception
>> failure in TID 142 on host sparky-s1.c.quick-heaven-560.internal:
>> java.lang.ClassNotFoundException:
>> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
>> [error]         java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>> [error]         java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>> [error]         java.security.AccessController.doPrivileged(Native Method)
>> [error]         java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>> [error]         java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>> [error]         java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>> [error]         java.lang.Class.forName0(Native Method)
>> [error]         java.lang.Class.forName(Class.java:270)
>> [error]
>>
>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
>> [error]
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>> [error]
>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>> [error]
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>> [error]
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> [error]
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>> [error]
>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>> [error]
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>> [error]
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> [error]
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> [error]
>>
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>> [error]
>> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
>> [error]
>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
>> [error]
>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
>> [error]
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>> [error]
>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>> [error]
>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>> [error]
>>
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>> [error]
>>
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
>> [error]
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169)
>> [error]
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> [error]
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> [error]         java.lang.Thread.run(Thread.java:744)
>>
>> Thanks for any help or guidance.
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-problem-with-saveAsTextFile-after-upgrade-Spark-0-9-0-1-0-0-tp6832p7122.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>

Mime
View raw message