spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sujeet jog <sujeet....@gmail.com>
Subject Re: local Vs Standalonecluster production deployment
Date Sat, 28 May 2016 19:18:17 GMT
Great, Thanks.

On Sun, May 29, 2016 at 12:38 AM, Chris Fregly <chris@fregly.com> wrote:

> btw, here's a handy Spark Config Generator by Ewan Higgs in in Gent,
> Belgium:
>
> code:  https://github.com/ehiggs/spark-config-gen
>
> demo:  http://ehiggs.github.io/spark-config-gen/
>
> my recent tweet on this:
> https://twitter.com/cfregly/status/736631633927753729
>
> On Sat, May 28, 2016 at 10:50 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> hang on. Free is telling me you have 8GB of memory. I was under the
>> impression that you had 4GB of RAM :)
>>
>> So with no app you have 3.99GB free ~ 4GB
>>  1st app takes 428MB of memory and the second is 425MB so pretty lean apps
>>
>> The question is the apps that I run take 2-3GB each. But your mileage
>> varies. If you end up with free memory running these minute apps and no
>> sudden spike in memory/cpu usage then as long as they run and finish within
>> SLA you should be OK whichever environment you run. May be you apps do not
>> require that amount of memory.
>>
>> I don't think there is clear cut answer to NOT to use local mode in prod.
>> Others may have different opinions on this.
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 28 May 2016 at 18:37, sujeet jog <sujeet.jog@gmail.com> wrote:
>>
>>> ran these from muliple bash shell for now, probably a multi threaded
>>> python script would do ,  memory and resource allocations are seen as
>>> submitted parameters
>>>
>>>
>>> *say before running any applications . *
>>>
>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>              total       used       free     shared    buffers     cached
>>> Mem:       8058568    *4066296 *   3992272      10172     141368
>>>  1549520
>>> -/+ buffers/cache:    2375408    5683160
>>> Swap:      8290300     108672    8181628
>>>
>>>
>>> *only 1 App : *
>>>
>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>              total       used       free     shared    buffers     cached
>>> Mem:       8058568    *4494488*    3564080      10172     141392
>>>  1549948
>>> -/+ buffers/cache:    2803148    5255420
>>> Swap:      8290300     108672    8181628
>>>
>>>
>>> ran the single APP twice in parallel ( memory used double as expected )
>>>
>>> [root@fos-elastic02 ~]# /usr/bin/free
>>>              total       used       free     shared    buffers     cached
>>> Mem:       8058568    *4919532 *   3139036      10172     141444
>>>  1550376
>>> -/+ buffers/cache:    3227712    4830856
>>> Swap:      8290300     108672    8181628
>>>
>>>
>>> Curious to know if local mode is used in real deployments where there is
>>> a scarcity of resources.
>>>
>>>
>>> Thanks,
>>> Sujeet
>>>
>>> On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> OK that is good news. So briefly how do you kick off spark-submit for
>>>> each (or sparkConf). In terms of memory/resources allocations.
>>>>
>>>> Now what is the output of
>>>>
>>>> /usr/bin/free
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 28 May 2016 at 18:12, sujeet jog <sujeet.jog@gmail.com> wrote:
>>>>
>>>>> Yes Mich,
>>>>> They are currently emitting the results parallely,
>>>>> http://localhost:4040 &  http://localhost:4041 , i also see the
>>>>> monitoring from these URL's,
>>>>>
>>>>>
>>>>> On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> ok they are submitted but the latter one 14302 is it doing anything?
>>>>>>
>>>>>> can you check it with jmonitor or the logs created
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 28 May 2016 at 18:03, sujeet jog <sujeet.jog@gmail.com>
wrote:
>>>>>>
>>>>>>> Thanks Ted,
>>>>>>>
>>>>>>> Thanks Mich,  yes i see that i can run two applications by
>>>>>>> submitting these,  probably Driver + Executor running in a single
JVM .
>>>>>>> In-Process Spark.
>>>>>>>
>>>>>>> wondering if this can be used in production systems,  the reason
for
>>>>>>> me considering local instead of standalone cluster mode is purely
because
>>>>>>> of CPU/MEM resources,  i.e,  i currently do not have the liberty
to use 1
>>>>>>> Driver & 1 Executor per application,    ( running in a embedded
network
>>>>>>> switch  )
>>>>>>>
>>>>>>>
>>>>>>> jps output
>>>>>>> [root@fos-elastic02 ~]# jps
>>>>>>> 14258 SparkSubmit
>>>>>>> 14503 Jps
>>>>>>> 14302 SparkSubmit
>>>>>>> ,
>>>>>>>
>>>>>>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> Ok so you want to run all this in local mode. In other words
>>>>>>>> something like below
>>>>>>>>
>>>>>>>> ${SPARK_HOME}/bin/spark-submit \
>>>>>>>>
>>>>>>>>                 --master local[2] \
>>>>>>>>
>>>>>>>>                 --driver-memory 2G \
>>>>>>>>
>>>>>>>>                 --num-executors=1 \
>>>>>>>>
>>>>>>>>                 --executor-memory=2G \
>>>>>>>>
>>>>>>>>                 --executor-cores=2 \
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure it will work for multiple drivers (app/JVM).
 The
>>>>>>>> only way you can find out is to do try it running two apps
simultaneously.
>>>>>>>> You have a number of tools.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. use jps to see the apps and PID
>>>>>>>>    2. use jmonitor to see memory/cpu/heap usage for each
>>>>>>>>    spark-submit job
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 28 May 2016 at 17:41, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>>>>>
>>>>>>>>> Sujeet:
>>>>>>>>>
>>>>>>>>> Please also see:
>>>>>>>>>
>>>>>>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>>>>>>
>>>>>>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Sujeet,
>>>>>>>>>>
>>>>>>>>>> if you have a single machine then it is Spark standalone
mode.
>>>>>>>>>>
>>>>>>>>>> In Standalone cluster mode Spark allocates resources
based on
>>>>>>>>>> cores. By default, an application will grab all the
cores in the cluster.
>>>>>>>>>>
>>>>>>>>>> You only have one worker that lives within the driver
JVM process
>>>>>>>>>> that you start when you start the application with
spark-shell or
>>>>>>>>>> spark-submit in the host where the cluster manager
is running.
>>>>>>>>>>
>>>>>>>>>> The Driver node runs on the same host that the cluster
manager is
>>>>>>>>>> running. The Driver requests the Cluster Manager
for resources to run
>>>>>>>>>> tasks. The worker is tasked to create the executor
(in this case there is
>>>>>>>>>> only one executor) for the Driver. The Executor runs
tasks for the Driver.
>>>>>>>>>> Only one executor can be allocated on each worker
per application. In your
>>>>>>>>>> case you only have
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The minimum you will need will be 2-4G of RAM and
two cores. Well
>>>>>>>>>> that is my experience. Yes you can submit more than
one spark-submit (the
>>>>>>>>>> driver) but they may queue up behind the running
one if there is not enough
>>>>>>>>>> resources.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You pointed out that you will be running few applications
in
>>>>>>>>>> parallel on the same host. The likelihood is that
you are using a VM
>>>>>>>>>> machine for this purpose and the best option is to
try running the first
>>>>>>>>>> one, Check Web GUI on  4040 to see the progress of
this Job. If you start
>>>>>>>>>> the next JVM then assuming it is working, it will
be using port 4041 and so
>>>>>>>>>> forth.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In actual fact try the command "free" to see how
much free memory
>>>>>>>>>> you have.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> HTH
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet.jog@gmail.com>
wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I have a question w.r.t  production deployment
mode of spark,
>>>>>>>>>>>
>>>>>>>>>>> I have 3 applications which i would like to run
independently on
>>>>>>>>>>> a single machine, i need to run the drivers in
the same machine.
>>>>>>>>>>>
>>>>>>>>>>> The amount of resources i have is also limited,
like 4- 5GB RAM
>>>>>>>>>>> , 3 - 4 cores.
>>>>>>>>>>>
>>>>>>>>>>> For deployment in standalone mode : i believe
i need
>>>>>>>>>>>
>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>>>>>>
>>>>>>>>>>> The issue here is i will require 6 JVM running
in parallel, for
>>>>>>>>>>> which i do not have sufficient CPU/MEM resources,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hence i was looking more towards a local mode
deployment mode,
>>>>>>>>>>> would like to know if anybody is using local
mode where Driver + Executor
>>>>>>>>>>> run in a single JVM in production mode.
>>>>>>>>>>>
>>>>>>>>>>> Are there any inherent issues upfront using local
mode for
>>>>>>>>>>> production base systems.?..
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> *Chris Fregly*
> Research Scientist @ Flux Capacitor AI
> "Bringing AI Back to the Future!"
> San Francisco, CA
> http://fluxcapacitor.ai
>

Mime
View raw message