spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Friedman <e...@spottedsnake.net>
Subject Re: specifying worker nodes when using the repl?
Date Mon, 19 May 2014 20:30:11 GMT
Sandy, thank you so much — that was indeed my omission!
Eric

On May 19, 2014, at 10:14 AM, Sandy Ryza <sandy.ryza@cloudera.com> wrote:

> Hi Eric,
> 
> Have you tried setting the SPARK_WORKER_INSTANCES env variable before running spark-shell?
> http://spark.apache.org/docs/0.9.0/running-on-yarn.html
> 
> -Sandy
> 
> 
> On Mon, May 19, 2014 at 8:08 AM, Eric Friedman <eric@spottedsnake.net> wrote:
> Hi
> 
> I am working with a Cloudera 5 cluster with 192 nodes and can’t work out how to get
the spark repo to use more than 2 nodes in an interactive session.
> 
> So, this works, but is non-interactive (using yarn-client as MASTER)
> 
> /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/bin/spark-class \
>   org.apache.spark.deploy.yarn.Client \
>   --jar /opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/lib/spark/examples/lib/spark-examples_2.10-0.9.0-cdh5.0.0.jar
\
>   --class org.apache.spark.examples.SparkPi \
>   --args yarn-standalone \
>   --args 10 \
>   --num-workers 100
> 
> There does not appear to be an (obvious?) way to get more than 2 nodes involved from
the repl.
> 
> I am running the REPL like this:
> 
> #!/bin/sh
> 
> . /etc/spark/conf.cloudera.spark/spark-env.sh
> 
> export SPARK_JAR=hdfs://nameservice1/user/spark/share/lib/spark-assembly.jar
> 
> export SPARK_WORKER_MEMORY=512m
> 
> export MASTER=yarn-client
> 
> exec $SPARK_HOME/bin/spark-shell
> 
> Now if I comment out the line with `export SPARK_JAR=…’ and run this again, I get
an error like this:
> 
> 14/05/19 08:03:41 ERROR Client: Error: You must set SPARK_JAR environment variable!
> Usage: org.apache.spark.deploy.yarn.Client [options] 
> Options:
>   --jar JAR_PATH             Path to your application's JAR file (required in yarn-cluster
mode)
>   --class CLASS_NAME         Name of your application's main class (required)
>   --args ARGS                Arguments to be passed to your application's main class.
>                              Mutliple invocations are possible, each will be passed in
order.
>   --num-workers NUM          Number of workers to start (Default: 2)
>   […]
> 
> But none of those options are exposed at the `spark-shell’ level.
> 
> Thanks in advance for your guidance.
> 
> Eric
> 


Mime
View raw message