spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tzahi File <tzahi.f...@ironsrc.com>
Subject Performance Issue
Date Tue, 08 Jan 2019 14:09:24 GMT
Hello,

I have some performance issue running SQL query on Spark.

The query contains one parquet partitioned table (partition by date) one
each partition is about 200gb and simple table with about 100 records. The
spark cluster is of type m5.2xlarge - 8 cores. I'm using Qubole interface
for running the SQL query.

After searching after how to improve my query I have added to the
configuration the above settings:
spark.sql.shuffle.partitions=1000
spark.dynamicAllocation.maxExecutors=200

There wasn't any significant improvement. I'm looking for any ideas
to improve my running time.


Thanks!
Tzahi

Mime
View raw message