spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation
Date Thu, 05 Mar 2015 17:31:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349139#comment-14349139
] 

Yin Huai commented on SPARK-5791:
---------------------------------

[~jameszhouyi] Thank you for the updated physical plan. What is the file format used for those
tables? ORC or Parquet? Also, what is the version of Spark? If Parquet is used, HiveTableScan
is not as efficient as our native parquet support (ParquetRelation2 in Spark SQL. Actually,
if you are using Spark 1.3 and data is stored as Parquet, you should not see HiveTableScan
when reading parquet data).

> [Spark SQL] show poor performance when multiple table do join operation
> -----------------------------------------------------------------------
>
>                 Key: SPARK-5791
>                 URL: https://issues.apache.org/jira/browse/SPARK-5791
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0
>            Reporter: Yi Zhou
>         Attachments: Physcial_Plan_Hive.txt, Physcial_Plan_SparkSQL_Updated.txt, Physical_Plan.txt
>
>
> Spark SQL shows poor performance when multiple tables do join operation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message