spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Axel Dahl <a...@whisperstream.com>
Subject Re: how do I execute a job on a single worker node in standalone mode
Date Wed, 19 Aug 2015 16:27:35 GMT
hmm maybe I spoke too soon.

I have an apache zeppelin instance running and have configured it to use 48
cores (each node only has 16 cores), so I figured by setting it to 48,
would mean that spark would grab 3 nodes.  what happens instead though is
that spark, reports that 48 cores are being used, but only executes
everything on 1 node, it looks like it's not grabbing the extra nodes.

On Wed, Aug 19, 2015 at 8:43 AM, Axel Dahl <axel@whisperstream.com> wrote:

> That worked great, thanks Andrew.
>
> On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <andrew@databricks.com> wrote:
>
>> Hi Axel,
>>
>> You can try setting `spark.deploy.spreadOut` to false (through your
>> conf/spark-defaults.conf file). What this does is essentially try to
>> schedule as many cores on one worker as possible before spilling over to
>> other workers. Note that you *must* restart the cluster through the sbin
>> scripts.
>>
>> For more information see:
>> http://spark.apache.org/docs/latest/spark-standalone.html.
>>
>> Feel free to let me know whether it works,
>> -Andrew
>>
>>
>> 2015-08-18 4:49 GMT-07:00 Igor Berman <igor.berman@gmail.com>:
>>
>>> by default standalone creates 1 executor on every worker machine per
>>> application
>>> number of overall cores is configured with --total-executor-cores
>>> so in general if you'll specify --total-executor-cores=1 then there
>>> would be only 1 core on some executor and you'll get what you want
>>>
>>> on the other hand, if you application needs all cores of your cluster
>>> and only some specific job should run on single executor there are few
>>> methods to achieve this
>>> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition
>>>
>>>
>>> On 18 August 2015 at 01:36, Axel Dahl <axel@whisperstream.com> wrote:
>>>
>>>> I have a 4 node cluster and have been playing around with the
>>>> num-executors parameters, executor-memory and executor-cores
>>>>
>>>> I set the following:
>>>> --executor-memory=10G
>>>> --num-executors=1
>>>> --executor-cores=8
>>>>
>>>> But when I run the job, I see that each worker, is running one executor
>>>> which has  2 cores and 2.5G memory.
>>>>
>>>> What I'd like to do instead is have Spark just allocate the job to a
>>>> single worker node?
>>>>
>>>> Is that possible in standalone mode or do I need a job/resource
>>>> scheduler like Yarn to do that?
>>>>
>>>> Thanks in advance,
>>>>
>>>> -Axel
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message