spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han JU <>
Subject Spark variable init problem
Date Wed, 07 Aug 2013 13:47:17 GMT

I'm just putting my hands on Spark and I wrote a simple job in scala.
It sketches like:

val TAB = "\t"

val support = 2

val sc = new SparkContext(...)

val raw = sc.textFile(...)

val filtered =

  line => {

    val lineSplit = line.split(TAB) // TAB is null and exception is thrown
during the run

  }).filter( p => p._2 >= support) // support here is 0 during the run


I run the sbt-assembly jar like "java -cp ..." on a standalone cluster, I
found out that when referenced in the RDD transformation, the 2 values, TAB
and support, are set to their default values. So TAB is null, and support
is 0 and no longer "\t" and 2 as they are initialized above.

If the same jar is run locally (MASTER is local or local[k] instead of
spark://...) on the same input, it runs perfectly. The code also runs well
in spark-shell on cluster.

For the jar to run correctly on cluster, I have to hard code the string
literal and the number in the RDD transformation part.

It really seems to me a weird bug, maybe it has something to do with the
sbt-assembly jar compilation? Some suggestions?


I'm using spark version 0.7.3 and scala 2.9.3.

*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

View raw message