spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sujeet jog <sujeet....@gmail.com>
Subject Re: local Vs Standalonecluster production deployment
Date Sat, 28 May 2016 17:37:29 GMT
ran these from muliple bash shell for now, probably a multi threaded python
script would do ,  memory and resource allocations are seen as submitted
parameters


*say before running any applications . *

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    *4066296 *   3992272      10172     141368    1549520
-/+ buffers/cache:    2375408    5683160
Swap:      8290300     108672    8181628


*only 1 App : *

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    *4494488*    3564080      10172     141392    1549948
-/+ buffers/cache:    2803148    5255420
Swap:      8290300     108672    8181628


ran the single APP twice in parallel ( memory used double as expected )

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    *4919532 *   3139036      10172     141444    1550376
-/+ buffers/cache:    3227712    4830856
Swap:      8290300     108672    8181628


Curious to know if local mode is used in real deployments where there is a
scarcity of resources.


Thanks,
Sujeet

On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> OK that is good news. So briefly how do you kick off spark-submit for each
> (or sparkConf). In terms of memory/resources allocations.
>
> Now what is the output of
>
> /usr/bin/free
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 May 2016 at 18:12, sujeet jog <sujeet.jog@gmail.com> wrote:
>
>> Yes Mich,
>> They are currently emitting the results parallely,
>> http://localhost:4040 &  http://localhost:4041 , i also see the
>> monitoring from these URL's,
>>
>>
>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>
>>> can you check it with jmonitor or the logs created
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 May 2016 at 18:03, sujeet jog <sujeet.jog@gmail.com> wrote:
>>>
>>>> Thanks Ted,
>>>>
>>>> Thanks Mich,  yes i see that i can run two applications by submitting
>>>> these,  probably Driver + Executor running in a single JVM .  In-Process
>>>> Spark.
>>>>
>>>> wondering if this can be used in production systems,  the reason for me
>>>> considering local instead of standalone cluster mode is purely because of
>>>> CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>>>> Driver & 1 Executor per application,    ( running in a embedded network
>>>> switch  )
>>>>
>>>>
>>>> jps output
>>>> [root@fos-elastic02 ~]# jps
>>>> 14258 SparkSubmit
>>>> 14503 Jps
>>>> 14302 SparkSubmit
>>>> ,
>>>>
>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Ok so you want to run all this in local mode. In other words something
>>>>> like below
>>>>>
>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>
>>>>>                 --master local[2] \
>>>>>
>>>>>                 --driver-memory 2G \
>>>>>
>>>>>                 --num-executors=1 \
>>>>>
>>>>>                 --executor-memory=2G \
>>>>>
>>>>>                 --executor-cores=2 \
>>>>>
>>>>>
>>>>> I am not sure it will work for multiple drivers (app/JVM).  The only
>>>>> way you can find out is to do try it running two apps simultaneously.
You
>>>>> have a number of tools.
>>>>>
>>>>>
>>>>>
>>>>>    1. use jps to see the apps and PID
>>>>>    2. use jmonitor to see memory/cpu/heap usage for each spark-submit
>>>>>    job
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 28 May 2016 at 17:41, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>>
>>>>>> Sujeet:
>>>>>>
>>>>>> Please also see:
>>>>>>
>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>
>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Sujeet,
>>>>>>>
>>>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>>>
>>>>>>> In Standalone cluster mode Spark allocates resources based on
>>>>>>> cores. By default, an application will grab all the cores in
the cluster.
>>>>>>>
>>>>>>> You only have one worker that lives within the driver JVM process
>>>>>>> that you start when you start the application with spark-shell
or
>>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>>
>>>>>>> The Driver node runs on the same host that the cluster manager
is
>>>>>>> running. The Driver requests the Cluster Manager for resources
to run
>>>>>>> tasks. The worker is tasked to create the executor (in this case
there is
>>>>>>> only one executor) for the Driver. The Executor runs tasks for
the Driver.
>>>>>>> Only one executor can be allocated on each worker per application.
In your
>>>>>>> case you only have
>>>>>>>
>>>>>>>
>>>>>>> The minimum you will need will be 2-4G of RAM and two cores.
Well
>>>>>>> that is my experience. Yes you can submit more than one spark-submit
(the
>>>>>>> driver) but they may queue up behind the running one if there
is not enough
>>>>>>> resources.
>>>>>>>
>>>>>>>
>>>>>>> You pointed out that you will be running few applications in
>>>>>>> parallel on the same host. The likelihood is that you are using
a VM
>>>>>>> machine for this purpose and the best option is to try running
the first
>>>>>>> one, Check Web GUI on  4040 to see the progress of this Job.
If you start
>>>>>>> the next JVM then assuming it is working, it will be using port
4041 and so
>>>>>>> forth.
>>>>>>>
>>>>>>>
>>>>>>> In actual fact try the command "free" to see how much free memory
>>>>>>> you have.
>>>>>>>
>>>>>>>
>>>>>>> HTH
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet.jog@gmail.com>
wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>>>
>>>>>>>> I have 3 applications which i would like to run independently
on a
>>>>>>>> single machine, i need to run the drivers in the same machine.
>>>>>>>>
>>>>>>>> The amount of resources i have is also limited, like 4- 5GB
RAM , 3
>>>>>>>> - 4 cores.
>>>>>>>>
>>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>>
>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>
>>>>>>>> The issue here is i will require 6 JVM running in parallel,
for
>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>
>>>>>>>>
>>>>>>>> Hence i was looking more towards a local mode deployment
mode,
>>>>>>>> would like to know if anybody is using local mode where Driver
+ Executor
>>>>>>>> run in a single JVM in production mode.
>>>>>>>>
>>>>>>>> Are there any inherent issues upfront using local mode for
>>>>>>>> production base systems.?..
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message