spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Starting executor without a master
Date Fri, 20 May 2016 06:57:06 GMT
OK this is basically form my notes for Spark standalone. Worker process is
the slave process

[image: Inline images 2]



You start worker as you showed

$SPARK_HOME/sbin/start-slaves.sh
Now that picks up the worker host node names from $SPARK_HOME/conf/slaves
files. So you still have to tell Spark where to run workers.

However, if I am correct regardless of what you have specified in slaves,
in this standalone mode there will not be any spark process spawned by the
driver on the slaves. In all probability you will be running one
spark-submit process on the driver node. You can see this through the
output of

jps|grep SparkSubmit

and you will see the details by running jmonitor for that SparkSubmit job

However, I still doubt whether Scheduling Across applications is feasible
in standalone mode.

The doc says

*Standalone mode:* By default, applications submitted to the standalone
mode cluster will run in FIFO (first-in-first-out) order, and each
application will try to use *all available nodes*. You can limit the number
of nodes an application uses by setting the spark.cores.max configuration
property in it, or change the default for applications that don’t set this
setting through spark.deploy.defaultCores. Finally, in addition to
controlling cores, each application’s spark.executor.memory setting
controls its memory use.

It uses the word all available nodes but I am not convinced if it will use
those nodes? Someone can possibly clarify this

HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 May 2016 at 02:03, Mathieu Longtin <mathieu@closetwork.org> wrote:

> Okay:
> *host=my.local.server*
> *port=someport*
>
> This is the spark-submit command, which runs on my local server:
> *$SPARK_HOME/bin/spark-submit --master spark://$host:$port
> --executor-memory 4g python-script.py with args*
>
> If I want 200 worker cores, I tell the cluster scheduler to run this
> command on 200 cores:
> *$SPARK_HOME/sbin/start-slave.sh --cores=1 --memory=4g
> spark://$host:$port *
>
> That's it. When the task starts, it uses all available workers. If for
> some reason, not enough cores are available immediately, it still starts
> processing with whatever it gets and the load will be spread further as
> workers come online.
>
>
> On Thu, May 19, 2016 at 8:24 PM Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> In a normal operation we tell spark which node the worker processes can
>> run by adding the nodenames to conf/slaves.
>>
>> Not very clear on this in your case all the jobs run locally with say 100
>> executor cores like below:
>>
>>
>> ${SPARK_HOME}/bin/spark-submit \
>>
>>                 --master local[*] \
>>
>>                 --driver-memory xg \  --default would be 512M
>>
>>                 --num-executors=1 \   -- This is the constraint in
>> stand-alone Spark cluster, whether specified or not
>>
>>                 --executor-memory=xG \ --
>>
>>                 --executor-cores=n \
>>
>> --master local[*] means all cores and --executor-cores in your case need
>> not be specified? or you can cap it like above --executor-cores=n. If it
>> is not specified then the Spark app will go and grab every core. Although
>> in practice that does not happen it is just an upper ceiling. It is FIFO.
>>
>> What typical executor memory is specified in your case?
>>
>> Do you have a  sample snapshot of spark-submit job by any chance Mathieu?
>>
>> Cheers
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 20 May 2016 at 00:27, Mathieu Longtin <mathieu@closetwork.org> wrote:
>>
>>> Mostly, the resource management is not up to the Spark master.
>>>
>>> We routinely start 100 executor-cores for 5 minute job, and they just
>>> quit when they are done. Then those processor cores can do something else
>>> entirely, they are not reserved for Spark at all.
>>>
>>> On Thu, May 19, 2016 at 4:55 PM Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Then in theory every user can fire multiple spark-submit jobs. do you
>>>> cap it with settings in  $SPARK_HOME/conf/spark-defaults.conf , but I
>>>> guess in reality every user submits one job only.
>>>>
>>>> This is an interesting model for two reasons:
>>>>
>>>>
>>>>    - It uses parallel processing across all the nodes or most of the
>>>>    nodes to minimise the processing time
>>>>    - it requires less intervention
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 19 May 2016 at 21:33, Mathieu Longtin <mathieu@closetwork.org>
>>>> wrote:
>>>>
>>>>> Driver memory is default. Executor memory depends on job, the caller
>>>>> decides how much memory to use. We don't specify --num-executors as we
want
>>>>> all cores assigned to the local master, since they were started by the
>>>>> current user. No local executor.  --master=spark://localhost:someport.
1
>>>>> core per executor.
>>>>>
>>>>> On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Thanks Mathieu
>>>>>>
>>>>>> So it would be interesting to see what resources allocated in your
>>>>>> case, especially the num-executors and executor-cores. I gather every
node
>>>>>> has enough memory and cores.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>
>>>>>>                 --master local[2] \
>>>>>>
>>>>>>                 --driver-memory 4g \
>>>>>>
>>>>>>                 --num-executors=1 \
>>>>>>
>>>>>>                 --executor-memory=4G \
>>>>>>
>>>>>>                 --executor-cores=2 \
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 19 May 2016 at 21:02, Mathieu Longtin <mathieu@closetwork.org>
>>>>>> wrote:
>>>>>>
>>>>>>> The driver (the process started by spark-submit) runs locally.
The
>>>>>>> executors run on any of thousands of servers. So far, I haven't
tried more
>>>>>>> than 500 executors.
>>>>>>>
>>>>>>> Right now, I run a master on the same server as the driver.
>>>>>>>
>>>>>>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> ok so you are using some form of NFS mounted file system
shared
>>>>>>>> among the nodes and basically you start the processes through
spark-submit.
>>>>>>>>
>>>>>>>> In Stand-alone mode, a simple cluster manager included with
>>>>>>>> Spark. It does the management of resources so it is not clear
to
>>>>>>>> me what you are referring as worker manager here?
>>>>>>>>
>>>>>>>> This is my take from your model.
>>>>>>>>  The application will go and grab all the cores in the cluster.
>>>>>>>> You only have one worker that lives within the driver JVM
process.
>>>>>>>> The Driver node runs on the same host that the cluster manager
is
>>>>>>>> running. The Driver requests the Cluster Manager for resources
to run
>>>>>>>> tasks. In this case there is only one executor for the Driver?
The Executor
>>>>>>>> runs tasks for the Driver.
>>>>>>>>
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 19 May 2016 at 20:37, Mathieu Longtin <mathieu@closetwork.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> No master and no node manager, just the processes that
do actual
>>>>>>>>> work.
>>>>>>>>>
>>>>>>>>> We use the "stand alone" version because we have a shared
file
>>>>>>>>> system and a way of allocating computing resources already
(Univa Grid
>>>>>>>>> Engine). If an executor were to die, we have other ways
of restarting it,
>>>>>>>>> we don't need the worker manager to deal with it.
>>>>>>>>>
>>>>>>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Mathieu
>>>>>>>>>>
>>>>>>>>>> What does this approach provide that the norm lacks?
>>>>>>>>>>
>>>>>>>>>> So basically each node has its master in this model.
>>>>>>>>>>
>>>>>>>>>> Are these supposed to be individual stand alone servers?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <mathieu@closetwork.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> First a bit of context:
>>>>>>>>>>> We use Spark on a platform where each user start
workers as
>>>>>>>>>>> needed. This has the advantage that all permission
management is handled by
>>>>>>>>>>> the OS, so the users can only read files they
have permission to.
>>>>>>>>>>>
>>>>>>>>>>> To do this, we have some utility that does the
following:
>>>>>>>>>>> - start a master
>>>>>>>>>>> - start worker managers on a number of servers
>>>>>>>>>>> - "submit" the Spark driver program
>>>>>>>>>>> - the driver then talks to the master, tell it
how many
>>>>>>>>>>> executors it needs
>>>>>>>>>>> - the master tell the worker nodes to start executors
and talk
>>>>>>>>>>> to the driver
>>>>>>>>>>> - the executors are started
>>>>>>>>>>>
>>>>>>>>>>> From here on, the master doesn't do much, neither
do the process
>>>>>>>>>>> manager on the worker nodes.
>>>>>>>>>>>
>>>>>>>>>>> What I would like to do is simplify this to:
>>>>>>>>>>> - Start the driver program
>>>>>>>>>>> - Start executors on a number of servers, telling
them where to
>>>>>>>>>>> find the driver
>>>>>>>>>>> - The executors connect directly to the driver
>>>>>>>>>>>
>>>>>>>>>>> Is there a way I could do this without the master
and worker
>>>>>>>>>>> managers?
>>>>>>>>>>>
>>>>>>>>>>> Thanks!
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Mathieu Longtin
>>>>>>>>>>> 1-514-803-8977
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>> Mathieu Longtin
>>>>>>>>> 1-514-803-8977
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>> Mathieu Longtin
>>>>>>> 1-514-803-8977
>>>>>>>
>>>>>>
>>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Mime
View raw message