spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lakshmi Nivedita <klnived...@gmail.com>
Subject Spark[SqL] performance tuning
Date Thu, 12 Nov 2020 09:48:19 GMT
Hi all,

I have pyspark sql script with loading of one table 80mb and one is 2 mb
and rest 3 are small tables performing lots of joins in the script to fetch
the data.

My system configuration is

4 nodes,300 GB,64 cores

To write a data frame into table 24Mb size records . System is taking 4
minutes 2 sec. with parameters

Driver memory -5G
Executor memory-20 G
Executor cores 5
Number of executors 40
Dynamicalloction.minexecutors 40
Max executors 40
Dynamic initial executors 17
Memory overhead 4G

With default partition 200

Could you please any one suggest me how can I tune this code.

-- 
k.Lakshmi Nivedita

Mime
View raw message