spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: local Vs Standalonecluster production deployment
Date Sat, 28 May 2016 17:50:32 GMT
hang on. Free is telling me you have 8GB of memory. I was under the
impression that you had 4GB of RAM :)

So with no app you have 3.99GB free ~ 4GB
 1st app takes 428MB of memory and the second is 425MB so pretty lean apps

The question is the apps that I run take 2-3GB each. But your mileage
varies. If you end up with free memory running these minute apps and no
sudden spike in memory/cpu usage then as long as they run and finish within
SLA you should be OK whichever environment you run. May be you apps do not
require that amount of memory.

I don't think there is clear cut answer to NOT to use local mode in prod.
Others may have different opinions on this.

HTH



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 28 May 2016 at 18:37, sujeet jog <sujeet.jog@gmail.com> wrote:

> ran these from muliple bash shell for now, probably a multi threaded
> python script would do ,  memory and resource allocations are seen as
> submitted parameters
>
>
> *say before running any applications . *
>
> [root@fos-elastic02 ~]# /usr/bin/free
>              total       used       free     shared    buffers     cached
> Mem:       8058568    *4066296 *   3992272      10172     141368
>  1549520
> -/+ buffers/cache:    2375408    5683160
> Swap:      8290300     108672    8181628
>
>
> *only 1 App : *
>
> [root@fos-elastic02 ~]# /usr/bin/free
>              total       used       free     shared    buffers     cached
> Mem:       8058568    *4494488*    3564080      10172     141392
>  1549948
> -/+ buffers/cache:    2803148    5255420
> Swap:      8290300     108672    8181628
>
>
> ran the single APP twice in parallel ( memory used double as expected )
>
> [root@fos-elastic02 ~]# /usr/bin/free
>              total       used       free     shared    buffers     cached
> Mem:       8058568    *4919532 *   3139036      10172     141444
>  1550376
> -/+ buffers/cache:    3227712    4830856
> Swap:      8290300     108672    8181628
>
>
> Curious to know if local mode is used in real deployments where there is a
> scarcity of resources.
>
>
> Thanks,
> Sujeet
>
> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> OK that is good news. So briefly how do you kick off spark-submit for
>> each (or sparkConf). In terms of memory/resources allocations.
>>
>> Now what is the output of
>>
>> /usr/bin/free
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 28 May 2016 at 18:12, sujeet jog <sujeet.jog@gmail.com> wrote:
>>
>>> Yes Mich,
>>> They are currently emitting the results parallely,
>>> http://localhost:4040 &  http://localhost:4041 , i also see the
>>> monitoring from these URL's,
>>>
>>>
>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>>
>>>> can you check it with jmonitor or the logs created
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 28 May 2016 at 18:03, sujeet jog <sujeet.jog@gmail.com> wrote:
>>>>
>>>>> Thanks Ted,
>>>>>
>>>>> Thanks Mich,  yes i see that i can run two applications by submitting
>>>>> these,  probably Driver + Executor running in a single JVM .  In-Process
>>>>> Spark.
>>>>>
>>>>> wondering if this can be used in production systems,  the reason for
>>>>> me considering local instead of standalone cluster mode is purely because
>>>>> of CPU/MEM resources,  i.e,  i currently do not have the liberty to use
1
>>>>> Driver & 1 Executor per application,    ( running in a embedded network
>>>>> switch  )
>>>>>
>>>>>
>>>>> jps output
>>>>> [root@fos-elastic02 ~]# jps
>>>>> 14258 SparkSubmit
>>>>> 14503 Jps
>>>>> 14302 SparkSubmit
>>>>> ,
>>>>>
>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Ok so you want to run all this in local mode. In other words
>>>>>> something like below
>>>>>>
>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>
>>>>>>                 --master local[2] \
>>>>>>
>>>>>>                 --driver-memory 2G \
>>>>>>
>>>>>>                 --num-executors=1 \
>>>>>>
>>>>>>                 --executor-memory=2G \
>>>>>>
>>>>>>                 --executor-cores=2 \
>>>>>>
>>>>>>
>>>>>> I am not sure it will work for multiple drivers (app/JVM).  The only
>>>>>> way you can find out is to do try it running two apps simultaneously.
You
>>>>>> have a number of tools.
>>>>>>
>>>>>>
>>>>>>
>>>>>>    1. use jps to see the apps and PID
>>>>>>    2. use jmonitor to see memory/cpu/heap usage for each
>>>>>>    spark-submit job
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 May 2016 at 17:41, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>>>
>>>>>>> Sujeet:
>>>>>>>
>>>>>>> Please also see:
>>>>>>>
>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>>
>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Sujeet,
>>>>>>>>
>>>>>>>> if you have a single machine then it is Spark standalone
mode.
>>>>>>>>
>>>>>>>> In Standalone cluster mode Spark allocates resources based
on
>>>>>>>> cores. By default, an application will grab all the cores
in the cluster.
>>>>>>>>
>>>>>>>> You only have one worker that lives within the driver JVM
process
>>>>>>>> that you start when you start the application with spark-shell
or
>>>>>>>> spark-submit in the host where the cluster manager is running.
>>>>>>>>
>>>>>>>> The Driver node runs on the same host that the cluster manager
is
>>>>>>>> running. The Driver requests the Cluster Manager for resources
to run
>>>>>>>> tasks. The worker is tasked to create the executor (in this
case there is
>>>>>>>> only one executor) for the Driver. The Executor runs tasks
for the Driver.
>>>>>>>> Only one executor can be allocated on each worker per application.
In your
>>>>>>>> case you only have
>>>>>>>>
>>>>>>>>
>>>>>>>> The minimum you will need will be 2-4G of RAM and two cores.
Well
>>>>>>>> that is my experience. Yes you can submit more than one spark-submit
(the
>>>>>>>> driver) but they may queue up behind the running one if there
is not enough
>>>>>>>> resources.
>>>>>>>>
>>>>>>>>
>>>>>>>> You pointed out that you will be running few applications
in
>>>>>>>> parallel on the same host. The likelihood is that you are
using a VM
>>>>>>>> machine for this purpose and the best option is to try running
the first
>>>>>>>> one, Check Web GUI on  4040 to see the progress of this Job.
If you start
>>>>>>>> the next JVM then assuming it is working, it will be using
port 4041 and so
>>>>>>>> forth.
>>>>>>>>
>>>>>>>>
>>>>>>>> In actual fact try the command "free" to see how much free
memory
>>>>>>>> you have.
>>>>>>>>
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet.jog@gmail.com>
wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have a question w.r.t  production deployment mode of
spark,
>>>>>>>>>
>>>>>>>>> I have 3 applications which i would like to run independently
on a
>>>>>>>>> single machine, i need to run the drivers in the same
machine.
>>>>>>>>>
>>>>>>>>> The amount of resources i have is also limited, like
4- 5GB RAM ,
>>>>>>>>> 3 - 4 cores.
>>>>>>>>>
>>>>>>>>> For deployment in standalone mode : i believe i need
>>>>>>>>>
>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>
>>>>>>>>> The issue here is i will require 6 JVM running in parallel,
for
>>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hence i was looking more towards a local mode deployment
mode,
>>>>>>>>> would like to know if anybody is using local mode where
Driver + Executor
>>>>>>>>> run in a single JVM in production mode.
>>>>>>>>>
>>>>>>>>> Are there any inherent issues upfront using local mode
for
>>>>>>>>> production base systems.?..
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message