spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <>
Subject Re: Is supposed to be stateless?
Date Fri, 03 Jan 2014 07:07:51 GMT
I agree that it would be good to do it only once, if you can find a nice way of doing so.


On Jan 3, 2014, at 1:33 AM, Andrew Ash <> wrote:

> In my I append to the SPARK_CLASSPATH variable rather than overriding it,
because I want to support both adding a jar to all instances of a shell (in
and adding a jar to a single shell instance (SPARK_CLASSPATH=/path/to/my.jar /path/to/spark-shell)
> That looks like this:
> #
> export SPARK_CLASSPATH+=":/path/to/hadoop-lzo.jar"
> However when my Master and workers run, they have duplicates of the SPARK_CLASSPATH jars.
 There are 3 copies of hadoop-lzo on the classpath, 2 of which are unnecessary.
> The resulting command line in ps looks like this:
> /path/to/java -cp :/path/to/hadoop-lzo.jar:/path/to/hadoop-lzo.jar:/path/to/hadoop-lzo.jar:[core
spark jars] ... -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://my-host:7077
> I tracked it down and the problem is that is sourced 3 times: in,
in, and in spark-class.  Each of those adds to the SPARK_CLASSPATH until
its contents are in triplicate.
> Are all of those calls necessary?  Is it possible to edit the daemon scripts to only
call once?
> FYI I'm starting the daemons with ./bin/ and ./bin/ 1 $SPARK_URL
> Thanks,
> Andrew

View raw message