spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ke Jia (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-26155) Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale
Date Sun, 09 Dec 2018 03:35:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-26155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ke Jia updated SPARK-26155:
---------------------------
    Attachment: tpcds.result.xlsx

> Spark SQL  performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB
scale
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-26155
>                 URL: https://issues.apache.org/jira/browse/SPARK-26155
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>            Reporter: Ke Jia
>            Priority: Major
>         Attachments: Q19 analysis in Spark2.3 with L486&487.pdf, Q19 analysis in
Spark2.3 without L486&487.pdf, q19.sql, tpcds.result.xlsx
>
>
> In our test environment, we found a serious performance degradation issue in Spark2.3
when running TPC-DS on SKX 8180. Several queries have serious performance degradation. For
example, TPC-DS Q19 needs 126 seconds with Spark 2.3 while it needs only 29 seconds with Spark2.1
on 3TB data. We investigated this problem and figured out the root cause is in community patch
SPARK-21052 which add metrics to hash join process. And the impact code is [L486|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L486] and [L487|https://github.com/apache/spark/blob/1d3dd58d21400b5652b75af7e7e53aad85a31528/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L487]  .
Q19 costs about 30 seconds without these two lines code and 126 seconds with these code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message