spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <>
Subject Re: Where does the Driver run?
Date Mon, 25 Mar 2019 02:44:54 GMT
There's also a driver ui (usually available on port 4040), after running
your code, I assume you are running it on your machine, visit
localhost:4040 and you will get the driver UI.

If you think the driver is running on your master/executor nodes, login to
those machines and do a

   netstat -napt | grep -I listen

You will see the driver listening on 404x there, this won't be the case
mostly as you are not doing Spark-submit or using the deployMode=cluster.

On Mon, 25 Mar 2019, 01:03 Pat Ferrel, <> wrote:

> Thanks, I have seen this many times in my research. Paraphrasing docs: “in
> deployMode ‘cluster' the Driver runs on a Worker in the cluster”
> When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1
> with addresses that match slaves). When I look at memory usage while the
> job runs I see virtually identical usage on the 2 Workers. This would
> support your claim and contradict Spark docs for deployMode = cluster.
> The evidence seems to contradict the docs. I am now beginning to wonder if
> the Driver only runs in the cluster if we use spark-submit????
> From: Akhil Das <> <>
> Reply: Akhil Das <> <>
> Date: March 23, 2019 at 9:26:50 PM
> To: Pat Ferrel <> <>
> Cc: user <> <>
> Subject:  Re: Where does the Driver run?
> If you are starting your "my-app" on your local machine, that's where the
> driver is running.
> [image: image.png]
> Hope this helps.
> <>
> On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel <> wrote:
>> I have researched this for a significant amount of time and find answers
>> that seem to be for a slightly different question than mine.
>> The Spark 2.3.3 cluster is running fine. I see the GUI on “
>> http://master-address:8080", there are 2 idle workers, as configured.
>> I have a Scala application that creates a context and starts execution of
>> a Job. I *do not use spark-submit*, I start the Job programmatically and
>> this is where many explanations forks from my question.
>> In "my-app" I create a new SparkConf, with the following code (slightly
>> abbreviated):
>>       conf.setAppName(“my-job")
>>       conf.setMaster(“spark://master-address:7077”)
>>       conf.set(“deployMode”, “cluster”)
>>       // other settings like driver and executor memory requests
>>       // the driver and executor memory requests are for all mem on the
>> slaves, more than
>>       // mem available on the launching machine with “my-app"
>>       val jars = listJars(“/path/to/lib")
>>       conf.setJars(jars)
>>       …
>> When I launch the job I see 2 executors running on the 2 workers/slaves.
>> Everything seems to run fine and sometimes completes successfully. Frequent
>> failures are the reason for this question.
>> Where is the Driver running? I don’t see it in the GUI, I see 2 Executors
>> taking all cluster resources. With a Yarn cluster I would expect the
>> “Driver" to run on/in the Yarn Master but I am using the Spark Standalone
>> Master, where is the Drive part of the Job running?
>> If is is running in the Master, we are in trouble because I start the
>> Master on one of my 2 Workers sharing resources with one of the Executors.
>> Executor mem + driver mem is > available mem on a Worker. I can change this
>> but need so understand where the Driver part of the Spark Job runs. Is it
>> in the Spark Master, or inside and Executor, or ???
>> The “Driver” creates and broadcasts some large data structures so the
>> need for an answer is more critical than with more typical tiny Drivers.
>> Thanks for you help!
> --
> Cheers!

View raw message