ran these from muliple bash shell for now, probably a multi threaded python script would do ,  memory and resource allocations are seen as submitted parameters 


say before running any applications . 

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    4066296    3992272      10172     141368    1549520
-/+ buffers/cache:    2375408    5683160
Swap:      8290300     108672    8181628
  

only 1 App : 

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    4494488    3564080      10172     141392    1549948
-/+ buffers/cache:    2803148    5255420
Swap:      8290300     108672    8181628


ran the single APP twice in parallel ( memory used double as expected ) 

[root@fos-elastic02 ~]# /usr/bin/free
             total       used       free     shared    buffers     cached
Mem:       8058568    4919532    3139036      10172     141444    1550376
-/+ buffers/cache:    3227712    4830856
Swap:      8290300     108672    8181628


Curious to know if local mode is used in real deployments where there is a scarcity of resources.


Thanks, 
Sujeet

On Sat, May 28, 2016 at 10:50 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
OK that is good news. So briefly how do you kick off spark-submit for each (or sparkConf). In terms of memory/resources allocations.

Now what is the output of

/usr/bin/free




On 28 May 2016 at 18:12, sujeet jog <sujeet.jog@gmail.com> wrote:
Yes Mich, 
They are currently emitting the results parallely,    http://localhost:4040 &  http://localhost:4041 , i also see the monitoring from these URL's,


On Sat, May 28, 2016 at 10:37 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
ok they are submitted but the latter one 14302 is it doing anything?

can you check it with jmonitor or the logs created

HTH




On 28 May 2016 at 18:03, sujeet jog <sujeet.jog@gmail.com> wrote:
Thanks Ted, 

Thanks Mich,  yes i see that i can run two applications by submitting these,  probably Driver + Executor running in a single JVM .  In-Process Spark. 

wondering if this can be used in production systems,  the reason for me considering local instead of standalone cluster mode is purely because of CPU/MEM resources,  i.e,  i currently do not have the liberty to use 1 Driver & 1 Executor per application,    ( running in a embedded network switch  ) 


jps output 
[root@fos-elastic02 ~]# jps
14258 SparkSubmit
14503 Jps
14302 SparkSubmit
,  

On Sat, May 28, 2016 at 10:21 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
Ok so you want to run all this in local mode. In other words something like below

${SPARK_HOME}/bin/spark-submit \

                --master local[2] \

                --driver-memory 2G \

                --num-executors=1 \

                --executor-memory=2G \

                --executor-cores=2 \


I am not sure it will work for multiple drivers (app/JVM).  The only way you can find out is to do try it running two apps simultaneously. You have a number of tools.


  1. use jps to see the apps and PID
  2. use jmonitor to see memory/cpu/heap usage for each spark-submit job
HTH


On 28 May 2016 at 17:41, Ted Yu <yuzhihong@gmail.com> wrote:

On Sat, May 28, 2016 at 9:19 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
Hi Sujeet,

if you have a single machine then it is Spark standalone mode.

In Standalone cluster mode Spark allocates resources based on cores. By default, an application will grab all the cores in the cluster.

You only have one worker that lives within the driver JVM process that you start when you start the application with spark-shell or spark-submit in the host where the cluster manager is running.

The Driver node runs on the same host that the cluster manager is running. The Driver requests the Cluster Manager for resources to run tasks. The worker is tasked to create the executor (in this case there is only one executor) for the Driver. The Executor runs tasks for the Driver. Only one executor can be allocated on each worker per application. In your case you only have


The minimum you will need will be 2-4G of RAM and two cores. Well that is my experience. Yes you can submit more than one spark-submit (the driver) but they may queue up behind the running one if there is not enough resources.


You pointed out that you will be running few applications in parallel on the same host. The likelihood is that you are using a VM machine for this purpose and the best option is to try running the first one, Check Web GUI on  4040 to see the progress of this Job. If you start the next JVM then assuming it is working, it will be using port 4041 and so forth.


In actual fact try the command "free" to see how much free memory you have.


HTH






On 28 May 2016 at 16:42, sujeet jog <sujeet.jog@gmail.com> wrote:
Hi, 

I have a question w.r.t  production deployment mode of spark, 

I have 3 applications which i would like to run independently on a single machine, i need to run the drivers in the same machine.

The amount of resources i have is also limited, like 4- 5GB RAM , 3 - 4 cores. 

For deployment in standalone mode : i believe i need 

1 Driver JVM,  1 worker node ( 1 executor ) 
1 Driver JVM,  1 worker node ( 1 executor ) 
1 Driver JVM,  1 worker node ( 1 executor ) 

The issue here is i will require 6 JVM running in parallel, for which i do not have sufficient CPU/MEM resources, 


Hence i was looking more towards a local mode deployment mode, would like to know if anybody is using local mode where Driver + Executor run in a single JVM in production mode. 

Are there any inherent issues upfront using local mode for production base systems.?..