spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Semenov <va...@datadoghq.com>
Subject Re: Inferring Data driven Spark parameters
Date Tue, 03 Jul 2018 12:58:31 GMT
You can't change the executor/driver cores/memory on the fly once
you've already started a Spark Context.
On Tue, Jul 3, 2018 at 4:30 AM Aakash Basu <aakash.spark.raj@gmail.com> wrote:
>
> We aren't using Oozie or similar, moreover, the end to end job shall be exactly the same,
but the data will be extremely different (number of continuous and categorical columns, vertical
size, horizontal size, etc), hence, if there would have been a calculation of the parameters
to arrive at a conclusion that we can simply get the data and derive the respective configuration/parameters,
it would be great.
>
> On Tue, Jul 3, 2018 at 1:09 PM, Jörn Franke <jornfranke@gmail.com> wrote:
>>
>> Don’t do this in your job. Create for different types of jobs different jobs and
orchestrate them using oozie or similar.
>>
>> On 3. Jul 2018, at 09:34, Aakash Basu <aakash.spark.raj@gmail.com> wrote:
>>
>> Hi,
>>
>> Cluster - 5 node (1 Driver and 4 workers)
>> Driver Config: 16 cores, 32 GB RAM
>> Worker Config: 8 cores, 16 GB RAM
>>
>> I'm using the below parameters from which I know the first chunk is cluster dependent
and the second chunk is data/code dependent.
>>
>> --num-executors 4
>> --executor-cores 5
>> --executor-memory 10G
>> --driver-cores 5
>> --driver-memory 25G
>>
>>
>> --conf spark.sql.shuffle.partitions=100
>> --conf spark.driver.maxResultSize=2G
>> --conf "spark.executor.extraJavaOptions=-XX:+UseParallelGC"
>> --conf spark.scheduler.listenerbus.eventqueue.capacity=20000
>>
>> I've come upto these values depending on my R&D on the properties and the issues
I faced and hence the handles.
>>
>> My ask here is -
>>
>> 1) How can I infer, using some formula or a code, to calculate the below chunk dependent
on the data/code?
>> 2) What are the other usable properties/configurations which I can use to shorten
my job runtime?
>>
>> Thanks,
>> Aakash.
>
>


-- 
Sent from my iPhone

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message