spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arko Provo Mukherjee <arkoprovomukher...@gmail.com>
Subject Re: Where does the Driver run?
Date Mon, 25 Mar 2019 02:31:31 GMT
Hello,

Is spark.driver.memory per Job or shared across jobs? You should do load
testing before setting this?

Thanks & regards
Arko


On Sun, Mar 24, 2019 at 3:09 PM Pat Ferrel <pat@occamsmachete.com> wrote:

>
> 2 Slaves, one of which is also Master.
>
> Node 1 & 2 are slaves. Node 1 is where I run start-all.sh.
>
> The machines both have 60g of free memory (leaving about 4g for the master
> process on Node 1). The only constraint to the Driver and Executors is
> spark.driver.memory = spark.executor.memory = 60g
>
> BTW I would expect this to create one Executor, one Driver, and the Master
> on 2 Workers.
>
>
>
>
> From: Andrew Melo <andrew.melo@gmail.com> <andrew.melo@gmail.com>
> Reply: Andrew Melo <andrew.melo@gmail.com> <andrew.melo@gmail.com>
> Date: March 24, 2019 at 12:46:35 PM
> To: Pat Ferrel <pat@occamsmachete.com> <pat@occamsmachete.com>
> Cc: Akhil Das <akhld@hacked.work> <akhld@hacked.work>, user
> <user@spark.apache.org> <user@spark.apache.org>
> Subject:  Re: Where does the Driver run?
>
> Hi Pat,
>
> On Sun, Mar 24, 2019 at 1:03 PM Pat Ferrel <pat@occamsmachete.com> wrote:
>
>> Thanks, I have seen this many times in my research. Paraphrasing docs:
>> “in deployMode ‘cluster' the Driver runs on a Worker in the cluster”
>>
>> When I look at logs I see 2 executors on the 2 slaves (executor 0 and 1
>> with addresses that match slaves). When I look at memory usage while the
>> job runs I see virtually identical usage on the 2 Workers. This would
>> support your claim and contradict Spark docs for deployMode = cluster.
>>
>> The evidence seems to contradict the docs. I am now beginning to wonder
>> if the Driver only runs in the cluster if we use spark-submit????
>>
>
> Where/how are you starting "./sbin/start-master.sh"?
>
> Cheers
> Andrew
>
>
>>
>>
>>
>> From: Akhil Das <akhld@hacked.work> <akhld@hacked.work>
>> Reply: Akhil Das <akhld@hacked.work> <akhld@hacked.work>
>> Date: March 23, 2019 at 9:26:50 PM
>> To: Pat Ferrel <pat@occamsmachete.com> <pat@occamsmachete.com>
>> Cc: user <user@spark.apache.org> <user@spark.apache.org>
>> Subject:  Re: Where does the Driver run?
>>
>> If you are starting your "my-app" on your local machine, that's where the
>> driver is running.
>>
>> [image: image.png]
>>
>> Hope this helps.
>> <https://spark.apache.org/docs/latest/cluster-overview.html>
>>
>> On Sun, Mar 24, 2019 at 4:13 AM Pat Ferrel <pat@occamsmachete.com> wrote:
>>
>>> I have researched this for a significant amount of time and find answers
>>> that seem to be for a slightly different question than mine.
>>>
>>> The Spark 2.3.3 cluster is running fine. I see the GUI on “
>>> http://master-address:8080", there are 2 idle workers, as configured.
>>>
>>> I have a Scala application that creates a context and starts execution
>>> of a Job. I *do not use spark-submit*, I start the Job programmatically and
>>> this is where many explanations forks from my question.
>>>
>>> In "my-app" I create a new SparkConf, with the following code (slightly
>>> abbreviated):
>>>
>>>       conf.setAppName(“my-job")
>>>       conf.setMaster(“spark://master-address:7077”)
>>>       conf.set(“deployMode”, “cluster”)
>>>       // other settings like driver and executor memory requests
>>>       // the driver and executor memory requests are for all mem on the
>>> slaves, more than
>>>       // mem available on the launching machine with “my-app"
>>>       val jars = listJars(“/path/to/lib")
>>>       conf.setJars(jars)
>>>       …
>>>
>>> When I launch the job I see 2 executors running on the 2 workers/slaves.
>>> Everything seems to run fine and sometimes completes successfully. Frequent
>>> failures are the reason for this question.
>>>
>>> Where is the Driver running? I don’t see it in the GUI, I see 2
>>> Executors taking all cluster resources. With a Yarn cluster I would expect
>>> the “Driver" to run on/in the Yarn Master but I am using the Spark
>>> Standalone Master, where is the Drive part of the Job running?
>>>
>>> If is is running in the Master, we are in trouble because I start the
>>> Master on one of my 2 Workers sharing resources with one of the Executors.
>>> Executor mem + driver mem is > available mem on a Worker. I can change this
>>> but need so understand where the Driver part of the Spark Job runs. Is it
>>> in the Spark Master, or inside and Executor, or ???
>>>
>>> The “Driver” creates and broadcasts some large data structures so the
>>> need for an answer is more critical than with more typical tiny Drivers.
>>>
>>> Thanks for you help!
>>>
>>
>>
>> --
>> Cheers!
>>
>>

Mime
View raw message