spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoju Wu (Jira)" <>
Subject [jira] [Created] (SPARK-30298) bucket join cannot work for self-join with views
Date Wed, 18 Dec 2019 13:34:00 GMT
Xiaoju Wu created SPARK-30298:

             Summary: bucket join cannot work for self-join with views
                 Key: SPARK-30298
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Xiaoju Wu

This UT may fail at the last line:
test("bucket join cannot work for self-join with views") {
      withTable("t1") {
        val df = (0 until 20).map(i => (i, i)).toDF("i", "j").as("df")
          .bucketBy(8, "i")

        sql(s"create view v1 as select * from t1").collect()

        val plan1 = sql("SELECT * FROM t1 a JOIN t1 b ON a.i = b.i").queryExecution.executedPlan
        assert(plan1.collect { case exchange : ShuffleExchangeExec => exchange }.isEmpty)

        val plan2 = sql("SELECT * FROM t1 a JOIN v1 b ON a.i = b.i").queryExecution.executedPlan
        assert(plan2.collect { case exchange : ShuffleExchangeExec => exchange }.isEmpty)

It's because View will add Project with Alias, then Join's requiredDistribution is based on
Alias, but ProjectExec passes child's outputPartition up without Alias. Then the satisfies
check cannot meet in EnsureRequirement.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message