spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Min Qiu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-12114) ColumnPruning rule fails in case of "Project <- Filter <- Join"
Date Thu, 03 Dec 2015 01:15:11 GMT
Min Qiu created SPARK-12114:
-------------------------------

             Summary: ColumnPruning rule fails in case of "Project <- Filter <- Join"
                 Key: SPARK-12114
                 URL: https://issues.apache.org/jira/browse/SPARK-12114
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Min Qiu


For the query
{quote}
SELECT c_name, c_custkey, o_orderkey, o_orderdate, 
       o_totalprice, sum(l_quantity) 
FROM customer join orders join lineitem 
      on c_custkey = o_custkey AND o_orderkey = l_orderkey 
     left outer join (SELECT l_orderkey tmp_orderkey 
                      FROM lineitem 
                      GROUP BY l_orderkey 
                      HAVING sum(l_quantity) > 300) tmp 
      on o_orderkey = tmp_orderkey 
WHERE tmp_orderkey IS NOT NULL 
GROUP BY c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice 
ORDER BY o_totalprice DESC, o_orderdate
{quote}

The optimizedPlan is 
{quote}
Sort \[o_totalprice#48 DESC,o_orderdate#49 ASC]
 
 Aggregate \[c_name#38,c_custkey#37,o_orderkey#45,o_orderdate#49,o_totalprice#48], \[c_name#38,c_custkey#37,o_orderkey#45,
               o_orderdate#49,o_totalprice#48,SUM(l_quantity#58) AS _c5#36]
  {color: green}Project \[c_name#38,o_orderdate#49,c_custkey#37,o_orderkey#45,o_totalprice#48,l_quantity#58]
   Filter IS NOT NULL tmp_orderkey#35
    Join LeftOuter, Some((o_orderkey#45 = tmp_orderkey#35)){color}
     Join Inner, Some((c_custkey#37 = o_custkey#46))
      MetastoreRelation default, customer, None
      Join Inner, Some((o_orderkey#45 = l_orderkey#54))
       MetastoreRelation default, orders, None
       MetastoreRelation default, lineitem, None
     Project \[tmp_orderkey#35]
      Filter havingCondition#86
       Aggregate \[l_orderkey#70], \[(SUM(l_quantity#74) > 300.0) AS havingCondition#86,l_orderkey#70
AS tmp_orderkey#35]
        Project \[l_orderkey#70,l_quantity#74]
         MetastoreRelation default, lineitem, None
{quote}

Due to the pattern highlighted in green that the ColumnPruning rule fails to deal with,  all
columns of lineitem and orders tables are scanned. The unneeded columns are also involved
in the data Shuffling. The performance is extremely bad if any one of the two tables is big.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message