spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Performance Issue
Date Wed, 09 Jan 2019 06:21:33 GMT
Hi,

Can you please let us know the SPARK version, and the query, and whether
the data is in parquet format or not, and where is it stored?

Regards,
Gourav Sengupta

On Wed, Jan 9, 2019 at 1:53 AM 大啊 <beliefer@163.com> wrote:

> What is your performance issue?
>
>
>
>
>
> At 2019-01-08 22:09:24, "Tzahi File" <tzahi.file@ironsrc.com> wrote:
>
> Hello,
>
> I have some performance issue running SQL query on Spark.
>
> The query contains one parquet partitioned table (partition by date) one
> each partition is about 200gb and simple table with about 100 records. The
> spark cluster is of type m5.2xlarge - 8 cores. I'm using Qubole interface
> for running the SQL query.
>
> After searching after how to improve my query I have added to the
> configuration the above settings:
> spark.sql.shuffle.partitions=1000
> spark.dynamicAllocation.maxExecutors=200
>
> There wasn't any significant improvement. I'm looking for any ideas
> to improve my running time.
>
>
> Thanks!
> Tzahi
>
>
>
>
>

Mime
View raw message