spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Iulian Dragos (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-2075) Anonymous classes are missing from Spark distribution
Date Wed, 15 Oct 2014 14:13:33 GMT

    [ https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172394#comment-14172394
] 

Iulian Dragos commented on SPARK-2075:
--------------------------------------

The Scala compiler produces stable names for anonymous functions. In fact, that's the reason
why the name of the enclosing method is part of the name: so that adding or removing an anonymous
function in another method does not change the numbering of the others. Names are assigned
by using a per-compilation unit counter and a prefix. Looking at the diff, there's quite a
different picture in the two cases (anonymous functions vs. anonymous classes). Are you sure
the two jars are built from the same sources?

I don't know how the `assembly` jar is produced, but if it's using some sort of whole-program
analysis and dead-code elimination, it might erroneously remove them. It might help to look
at the inputs to the assembly and see if the class is already missing.

Another possibility is running `scalac -optimize` in only one of the two builds. However,
looking at current sources I can't see why the inliner would remove those closures (the class
is not final, and `map` is not final either, so they can't be resolved and inlined).. 

> Anonymous classes are missing from Spark distribution
> -----------------------------------------------------
>
>                 Key: SPARK-2075
>                 URL: https://issues.apache.org/jira/browse/SPARK-2075
>             Project: Spark
>          Issue Type: Bug
>          Components: Build, Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Paul R. Brown
>            Priority: Critical
>             Fix For: 1.0.1
>
>
> Running a job built against the Maven dep for 1.0.0 and the hadoop1 distribution produces:
> {code}
> java.lang.ClassNotFoundException:
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
> {code}
> Here's what's in the Maven dep as of 1.0.0:
> {code}
> jar tvf ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 13:57:58 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
> And here's what's in the hadoop1 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
> {code}
> I.e., it's not there.  It is in the hadoop2 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
>   1519 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
>   1560 Mon May 26 07:29:54 PDT 2014 org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message