spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sujeet jog <sujeet....@gmail.com>
Subject Re: local Vs Standalonecluster production deployment
Date Sat, 28 May 2016 17:12:46 GMT
Yes Mich,
They are currently emitting the results parallely,    http://localhost:4040
&  http://localhost:4041 , i also see the monitoring from these URL's,


On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> ok they are submitted but the latter one 14302 is it doing anything?
>
> can you check it with jmonitor or the logs created
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 28 May 2016 at 18:03, sujeet jog <sujeet.jog@gmail.com> wrote:
>
>> Thanks Ted,
>>
>> Thanks Mich,  yes i see that i can run two applications by submitting
>> these,  probably Driver + Executor running in a single JVM .  In-Process
>> Spark.
>>
>> wondering if this can be used in production systems,  the reason for me
>> considering local instead of standalone cluster mode is purely because of
>> CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1
>> Driver & 1 Executor per application,    ( running in a embedded network
>> switch  )
>>
>>
>> jps output
>> [root@fos-elastic02 ~]# jps
>> 14258 SparkSubmit
>> 14503 Jps
>> 14302 SparkSubmit
>> ,
>>
>> On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Ok so you want to run all this in local mode. In other words something
>>> like below
>>>
>>> ${SPARK_HOME}/bin/spark-submit \
>>>
>>>                 --master local[2] \
>>>
>>>                 --driver-memory 2G \
>>>
>>>                 --num-executors=1 \
>>>
>>>                 --executor-memory=2G \
>>>
>>>                 --executor-cores=2 \
>>>
>>>
>>> I am not sure it will work for multiple drivers (app/JVM).  The only way
>>> you can find out is to do try it running two apps simultaneously. You have
>>> a number of tools.
>>>
>>>
>>>
>>>    1. use jps to see the apps and PID
>>>    2. use jmonitor to see memory/cpu/heap usage for each spark-submit
>>>    job
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 28 May 2016 at 17:41, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> Sujeet:
>>>>
>>>> Please also see:
>>>>
>>>> https://spark.apache.org/docs/latest/spark-standalone.html
>>>>
>>>> On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Hi Sujeet,
>>>>>
>>>>> if you have a single machine then it is Spark standalone mode.
>>>>>
>>>>> In Standalone cluster mode Spark allocates resources based on cores.
>>>>> By default, an application will grab all the cores in the cluster.
>>>>>
>>>>> You only have one worker that lives within the driver JVM process that
>>>>> you start when you start the application with spark-shell or spark-submit
>>>>> in the host where the cluster manager is running.
>>>>>
>>>>> The Driver node runs on the same host that the cluster manager is
>>>>> running. The Driver requests the Cluster Manager for resources to run
>>>>> tasks. The worker is tasked to create the executor (in this case there
is
>>>>> only one executor) for the Driver. The Executor runs tasks for the Driver.
>>>>> Only one executor can be allocated on each worker per application. In
your
>>>>> case you only have
>>>>>
>>>>>
>>>>> The minimum you will need will be 2-4G of RAM and two cores. Well that
>>>>> is my experience. Yes you can submit more than one spark-submit (the
>>>>> driver) but they may queue up behind the running one if there is not
enough
>>>>> resources.
>>>>>
>>>>>
>>>>> You pointed out that you will be running few applications in parallel
>>>>> on the same host. The likelihood is that you are using a VM machine for
>>>>> this purpose and the best option is to try running the first one, Check
Web
>>>>> GUI on  4040 to see the progress of this Job. If you start the next JVM
>>>>> then assuming it is working, it will be using port 4041 and so forth.
>>>>>
>>>>>
>>>>> In actual fact try the command "free" to see how much free memory you
>>>>> have.
>>>>>
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 28 May 2016 at 16:42, sujeet jog <sujeet.jog@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have a question w.r.t  production deployment mode of spark,
>>>>>>
>>>>>> I have 3 applications which i would like to run independently on
a
>>>>>> single machine, i need to run the drivers in the same machine.
>>>>>>
>>>>>> The amount of resources i have is also limited, like 4- 5GB RAM ,
3 -
>>>>>> 4 cores.
>>>>>>
>>>>>> For deployment in standalone mode : i believe i need
>>>>>>
>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>> 1 Driver JVM,  1 worker node ( 1 executor )
>>>>>>
>>>>>> The issue here is i will require 6 JVM running in parallel, for which
>>>>>> i do not have sufficient CPU/MEM resources,
>>>>>>
>>>>>>
>>>>>> Hence i was looking more towards a local mode deployment mode, would
>>>>>> like to know if anybody is using local mode where Driver + Executor
run in
>>>>>> a single JVM in production mode.
>>>>>>
>>>>>> Are there any inherent issues upfront using local mode for production
>>>>>> base systems.?..
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message