spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 大啊 <belie...@163.com>
Subject Re:Performance Issue
Date Wed, 09 Jan 2019 01:52:54 GMT
What is your performance issue?






At 2019-01-08 22:09:24, "Tzahi File" <tzahi.file@ironsrc.com> wrote:

Hello, 


I have some performance issue running SQL query on Spark. 


The query contains one parquet partitioned table (partition by date) one each partition is
about 200gb and simple table with about 100 records. The spark cluster is of type m5.2xlarge
- 8 cores. I'm using Qubole interface for running the SQL query. 


After searching after how to improve my query I have added to the configuration the above
settings:
spark.sql.shuffle.partitions=1000
spark.dynamicAllocation.maxExecutors=200


There wasn't any significant improvement. I'm looking for any ideas to improve my running
time.




Thanks! 
Tzahi 


Mime
View raw message