spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Hao (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation
Date Thu, 05 Mar 2015 07:09:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348293#comment-14348293
] 

Cheng Hao edited comment on SPARK-5791 at 3/5/15 7:08 AM:
----------------------------------------------------------

I think this is a typical case that we need to optimize the join for the dimension tables,
as they have lots of the data are filtered out with the join condition.

In this case it's possible most of records in the factor table 'inv' are filtered for the
join condition of 
{panel}
JOIN date_dim d ON inv.inv_date_sk = d.d_date_sk
    WHERE datediff(d_date, '2001-05-08') >= -30
    AND datediff(d_date, '2001-05-08') <= 30
{panel}


was (Author: chenghao):
I think this is a typical case that we need to optimize the join for the dimension tables,
as they have lots of the data are filtered out with the join condition.

In this case it's possible most of data are filtered for the join condition of 
{panel}
JOIN date_dim d ON inv.inv_date_sk = d.d_date_sk
    WHERE datediff(d_date, '2001-05-08') >= -30
    AND datediff(d_date, '2001-05-08') <= 30
{panel}

> [Spark SQL] show poor performance when multiple table do join operation
> -----------------------------------------------------------------------
>
>                 Key: SPARK-5791
>                 URL: https://issues.apache.org/jira/browse/SPARK-5791
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Yi Zhou
>         Attachments: Physcial_Plan_Hive.txt, Physical_Plan.txt
>
>
> Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message