spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian O'Connell" <...@ianoconnell.com>
Subject Re: Spark variable init problem
Date Wed, 07 Aug 2013 15:10:52 GMT
is your code is probably part of an object? the closure cleaner doesn't
attempt to pull in parts of objects.

What does the code around your sketched section look like?


On Wed, Aug 7, 2013 at 6:47 AM, Han JU <ju.han.felix@gmail.com> wrote:

> Hi,
>
> I'm just putting my hands on Spark and I wrote a simple job in scala.
> It sketches like:
>
> val TAB = "\t"
>
> val support = 2
>
> val sc = new SparkContext(...)
>
> val raw = sc.textFile(...)
>
> val filtered = raw.map(
>
>   line => {
>
>     val lineSplit = line.split(TAB) // TAB is null and exception is thrown
> during the run
>     ...
>
>   }).filter( p => p._2 >= support) // support here is 0 during the run
>
> ...
>
> I run the sbt-assembly jar like "java -cp ..." on a standalone cluster, I
> found out that when referenced in the RDD transformation, the 2 values, TAB
> and support, are set to their default values. So TAB is null, and support
> is 0 and no longer "\t" and 2 as they are initialized above.
>
> If the same jar is run locally (MASTER is local or local[k] instead of
> spark://...) on the same input, it runs perfectly. The code also runs well
> in spark-shell on cluster.
>
> For the jar to run correctly on cluster, I have to hard code the string
> literal and the number in the RDD transformation part.
>
> It really seems to me a weird bug, maybe it has something to do with the
> sbt-assembly jar compilation? Some suggestions?
>
> Thanks.
>
> I'm using spark version 0.7.3 and scala 2.9.3.
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

Mime
View raw message