Hi Axel, what spark version are you using? Also, what do your configurations look like for the following?

- spark.cores.max (also --total-executor-cores)
- spark.executor.cores (also --executor-cores)


2015-08-19 9:27 GMT-07:00 Axel Dahl <axel@whisperstream.com>:
hmm maybe I spoke too soon.

I have an apache zeppelin instance running and have configured it to use 48 cores (each node only has 16 cores), so I figured by setting it to 48, would mean that spark would grab 3 nodes.  what happens instead though is that spark, reports that 48 cores are being used, but only executes everything on 1 node, it looks like it's not grabbing the extra nodes.

On Wed, Aug 19, 2015 at 8:43 AM, Axel Dahl <axel@whisperstream.com> wrote:
That worked great, thanks Andrew.

On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <andrew@databricks.com> wrote:
Hi Axel,

You can try setting `spark.deploy.spreadOut` to false (through your conf/spark-defaults.conf file). What this does is essentially try to schedule as many cores on one worker as possible before spilling over to other workers. Note that you *must* restart the cluster through the sbin scripts.


Feel free to let me know whether it works,
-Andrew


2015-08-18 4:49 GMT-07:00 Igor Berman <igor.berman@gmail.com>:
by default standalone creates 1 executor on every worker machine per application
number of overall cores is configured with --total-executor-cores
so in general if you'll specify --total-executor-cores=1 then there would be only 1 core on some executor and you'll get what you want

on the other hand, if you application needs all cores of your cluster and only some specific job should run on single executor there are few methods to achieve this
e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition


On 18 August 2015 at 01:36, Axel Dahl <axel@whisperstream.com> wrote:
I have a 4 node cluster and have been playing around with the num-executors parameters, executor-memory and executor-cores

I set the following:
--executor-memory=10G
--num-executors=1
--executor-cores=8

But when I run the job, I see that each worker, is running one executor which has  2 cores and 2.5G memory.

What I'd like to do instead is have Spark just allocate the job to a single worker node?

Is that possible in standalone mode or do I need a job/resource scheduler like Yarn to do that?

Thanks in advance,

-Axel