spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoju Wu (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-30298) bucket join cannot work for self-join with views
Date Wed, 18 Dec 2019 13:34:00 GMT
Xiaoju Wu created SPARK-30298:
---------------------------------

             Summary: bucket join cannot work for self-join with views
                 Key: SPARK-30298
                 URL: https://issues.apache.org/jira/browse/SPARK-30298
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Xiaoju Wu


This UT may fail at the last line:
{code:java}
test("bucket join cannot work for self-join with views") {
    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "1") {
      withTable("t1") {
        val df = (0 until 20).map(i => (i, i)).toDF("i", "j").as("df")
        df.write
          .format("parquet")
          .bucketBy(8, "i")
          .saveAsTable("t1")

        sql(s"create view v1 as select * from t1").collect()

        val plan1 = sql("SELECT * FROM t1 a JOIN t1 b ON a.i = b.i").queryExecution.executedPlan
        assert(plan1.collect { case exchange : ShuffleExchangeExec => exchange }.isEmpty)

        val plan2 = sql("SELECT * FROM t1 a JOIN v1 b ON a.i = b.i").queryExecution.executedPlan
        assert(plan2.collect { case exchange : ShuffleExchangeExec => exchange }.isEmpty)
      }
    }
  }
{code}

It's because View will add Project with Alias, then Join's requiredDistribution is based on
Alias, but ProjectExec passes child's outputPartition up without Alias. Then the satisfies
check cannot meet in EnsureRequirement.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message