My guess is you're asking for all cores of all machines but the driver needs at least one core, so one executor is unable to find a machine to fit on.

On Nov 18, 2014 7:04 PM, "Alan Prando" <alan@scanboo.com.br> wrote:
Hi Folks!

I'm running Spark on YARN cluster installed with Cloudera Manager Express.
The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G RAM.

My spark's job is working fine, however it seems that just 2 of 3 slaves are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves without any processing).

I'm using this command:
./spark-submit --master yarn --num-executors 3 --executor-cores 32  --executor-memory 32g feature_extractor.py -r 390

Additionaly, spark's log testify communications with 2 slaves only:
14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469] with ID 1
14/11/18 17:19:38 INFO RackResolver: Resolved ip-172-31-13-180.ec2.internal to /default
14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724] with ID 2
14/11/18 17:19:38 INFO RackResolver: Resolved ip-172-31-13-179.ec2.internal to /default
14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000(ms)

Is there a configuration to call spark's job on YARN cluster with all slaves?

Thanks in advance! =]

---
Regards
Alan Vidotti Prando.