Is spark.driver.memory per Job or shared across jobs? You should do load testing before setting this?

Thanks & regards

On Sun, Mar 24, 2019 at 3:09 PM Pat Ferrel <pat@occamsmachete.com> wrote:

2 Slaves, one of which is also Master.

Node 1 & 2 are slaves. Node 1 is where I run start-all.sh.

The machines both have 60g of free memory (leaving about 4g for the master process on Node 1). The only constraint to the Driver and Executors is spark.driver.memory = spark.executor.memory = 60g

BTW I would expect this to create one Executor, one Driver, and the Master on 2 Workers.

From: Andrew Melo <andrew.melo@gmail.com>
Reply: Andrew Melo <andrew.melo@gmail.com>
Date: March 24, 2019 at 12:46:35 PM
To: Pat Ferrel <pat@occamsmachete.com>
Cc: Akhil Das <akhld@hacked.work>, user <user@spark.apache.org>
Subject:  Re: Where does the Driver run?

Hi Pat,

On Sun, Mar 24, 2019 at 1:03 PM Pat Ferrel <pat@occamsmachete.com> wrote:
Thanks, I have seen this many times in my research. Paraphrasing docs: “in deployMode ‘cluster' the Driver runs on a Worker in the cluster”

When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1 with addresses that match slaves). When I look at memory usage while the job runs I see virtually identical usage on the 2 Workers. This would support your claim and contradict Spark docs for deployMode = cluster.

The evidence seems to contradict the docs. I am now beginning to wonder if the Driver only runs in the cluster if we use spark-submit????

Where/how are you starting "./sbin/start-master.sh"?


From: Akhil Das <akhld@hacked.work>
Reply: Akhil Das <akhld@hacked.work>
Date: March 23, 2019 at 9:26:50 PM
To: Pat Ferrel <pat@occamsmachete.com>
Cc: user <user@spark.apache.org>
Subject:  Re: Where does the Driver run?

If you are starting your "my-app" on your local machine, that's where the driver is running.


On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel <pat@occamsmachete.com> wrote:
I have researched this for a significant amount of time and find answers that seem to be for a slightly different question than mine.

The Spark 2.3.3 cluster is running fine. I see the GUI on “http://master-address:8080", there are 2 idle workers, as configured.

I have a Scala application that creates a context and starts execution of a Job. I *do not use spark-submit*, I start the Job programmatically and this is where many explanations forks from my question.

In "my-app" I create a new SparkConf, with the following code (slightly abbreviated):

      conf.set(“deployMode”, “cluster”)
      // other settings like driver and executor memory requests
      // the driver and executor memory requests are for all mem on the slaves, more than 
      // mem available on the launching machine with “my-app"
      val jars = listJars(“/path/to/lib")

When I launch the job I see 2 executors running on the 2 workers/slaves. Everything seems to run fine and sometimes completes successfully. Frequent failures are the reason for this question.

Where is the Driver running? I don’t see it in the GUI, I see 2 Executors taking all cluster resources. With a Yarn cluster I would expect the “Driver" to run on/in the Yarn Master but I am using the Spark Standalone Master, where is the Drive part of the Job running?

If is is running in the Master, we are in trouble because I start the Master on one of my 2 Workers sharing resources with one of the Executors. Executor mem + driver mem is > available mem on a Worker. I can change this but need so understand where the Driver part of the Spark Job runs. Is it in the Spark Master, or inside and Executor, or ???

The “Driver” creates and broadcasts some large data structures so the need for an answer is more critical than with more typical tiny Drivers.

Thanks for you help!