spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pat Ferrel (JIRA)" <>
Subject [jira] [Commented] (SPARK-2075) Anonymous classes are missing from Spark distribution
Date Tue, 21 Oct 2014 00:56:34 GMT


Pat Ferrel commented on SPARK-2075:

Is there any more on this?

Building Spark from the 1.1.0 tar for Hadoop 1.2.1--all is well. Trying to upgrade Mahout
to use Spark 1.1.0. The Mahout 1.0-snapshot source builds and build tests pass with spark
1.1.0 as a maven dependency. Running the Mahout build on some bigger data using my dev machine
as a standalone single node Spark cluster. So the same code is running as executed the build
tests, just in single node cluster mode. Also since I built Spark i assume it is using the
artifact from my .m2 maven cache, but not 100% on that. Anyway I get the class not found error

I assume the missing function is the anon function passed to the{anon function})saveAsTextFile
???? so shouldn't the function be in the Mahout jar (it isn't)? Isn't this function passed
in from Mahout so I don't understand why it matters how Spark was built. 

Several other users are getting this for Spark 1.0.2. If we are doing something wrong in our
build process we'd appreciate a pointer.

Here's the error I get:

14/10/20 17:21:36 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 8.0 (TID 16,
java.lang.ClassNotFoundException: org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1$ Method)
        java.lang.Class.forName0(Native Method)

> Anonymous classes are missing from Spark distribution
> -----------------------------------------------------
>                 Key: SPARK-2075
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Build, Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Paul R. Brown
>            Priority: Critical
>             Fix For: 1.0.1
> Running a job built against the Maven dep for 1.0.0 and the hadoop1 distribution produces:
> {code}
> java.lang.ClassNotFoundException:
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
> {code}
> Here's what's in the Maven dep as of 1.0.0:
> {code}
> jar tvf ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
> And here's what's in the hadoop1 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
> {code}
> I.e., it's not there.  It is in the hadoop2 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message