spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: how do I execute a job on a single worker node in standalone mode
Date Wed, 19 Aug 2015 20:24:00 GMT
Hi Axel, what spark version are you using? Also, what do your
configurations look like for the following?

- spark.cores.max (also --total-executor-cores)
- spark.executor.cores (also --executor-cores)


2015-08-19 9:27 GMT-07:00 Axel Dahl <axel@whisperstream.com>:

> hmm maybe I spoke too soon.
>
> I have an apache zeppelin instance running and have configured it to use
> 48 cores (each node only has 16 cores), so I figured by setting it to 48,
> would mean that spark would grab 3 nodes.  what happens instead though is
> that spark, reports that 48 cores are being used, but only executes
> everything on 1 node, it looks like it's not grabbing the extra nodes.
>
> On Wed, Aug 19, 2015 at 8:43 AM, Axel Dahl <axel@whisperstream.com> wrote:
>
>> That worked great, thanks Andrew.
>>
>> On Tue, Aug 18, 2015 at 1:39 PM, Andrew Or <andrew@databricks.com> wrote:
>>
>>> Hi Axel,
>>>
>>> You can try setting `spark.deploy.spreadOut` to false (through your
>>> conf/spark-defaults.conf file). What this does is essentially try to
>>> schedule as many cores on one worker as possible before spilling over to
>>> other workers. Note that you *must* restart the cluster through the sbin
>>> scripts.
>>>
>>> For more information see:
>>> http://spark.apache.org/docs/latest/spark-standalone.html.
>>>
>>> Feel free to let me know whether it works,
>>> -Andrew
>>>
>>>
>>> 2015-08-18 4:49 GMT-07:00 Igor Berman <igor.berman@gmail.com>:
>>>
>>>> by default standalone creates 1 executor on every worker machine per
>>>> application
>>>> number of overall cores is configured with --total-executor-cores
>>>> so in general if you'll specify --total-executor-cores=1 then there
>>>> would be only 1 core on some executor and you'll get what you want
>>>>
>>>> on the other hand, if you application needs all cores of your cluster
>>>> and only some specific job should run on single executor there are few
>>>> methods to achieve this
>>>> e.g. coallesce(1) or dummyRddWithOnePartitionOnly.foreachPartition
>>>>
>>>>
>>>> On 18 August 2015 at 01:36, Axel Dahl <axel@whisperstream.com> wrote:
>>>>
>>>>> I have a 4 node cluster and have been playing around with the
>>>>> num-executors parameters, executor-memory and executor-cores
>>>>>
>>>>> I set the following:
>>>>> --executor-memory=10G
>>>>> --num-executors=1
>>>>> --executor-cores=8
>>>>>
>>>>> But when I run the job, I see that each worker, is running one
>>>>> executor which has  2 cores and 2.5G memory.
>>>>>
>>>>> What I'd like to do instead is have Spark just allocate the job to a
>>>>> single worker node?
>>>>>
>>>>> Is that possible in standalone mode or do I need a job/resource
>>>>> scheduler like Yarn to do that?
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> -Axel
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message