spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <>
Subject Is supposed to be stateless?
Date Fri, 03 Jan 2014 06:33:44 GMT
In my I append to the SPARK_CLASSPATH variable rather than
overriding it, because I want to support both adding a jar to all instances
of a shell (in and adding a jar to a single shell
instance (SPARK_CLASSPATH=/path/to/my.jar

That looks like this:

export SPARK_CLASSPATH+=":/path/to/hadoop-lzo.jar"

However when my Master and workers run, they have duplicates of the
SPARK_CLASSPATH jars.  There are 3 copies of hadoop-lzo on the classpath, 2
of which are unnecessary.

The resulting command line in ps looks like this:
/path/to/java -cp
spark jars] ... -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker

I tracked it down and the problem is that is sourced 3 times:
in, in, and in spark-class.  Each of
those adds to the SPARK_CLASSPATH until its contents are in triplicate.

Are all of those calls necessary?  Is it possible to edit the daemon
scripts to only call once?

FYI I'm starting the daemons with ./bin/ and
./bin/ 1 $SPARK_URL


View raw message