spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nattavut Sutyanyong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-19047) Invalid correlated column may not be reported as an error
Date Sun, 01 Jan 2017 16:13:58 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nattavut Sutyanyong updated SPARK-19047:
----------------------------------------
    Description: 
[subquery/in-subquery/in-group-by.sql TC 01.12]

{code}
Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
Seq((1,1,1)).toDF("t3a", "t3b", "t3c").createOrReplaceTempView("t3")

— TC 01.12
select * from t1 where t1a in
(select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in
(select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b ))

== Parsed Logical Plan ==
'Project [*]
+- 'Filter 't1a IN (list#803)
   :  +- 'Filter 't2c IN (list#802)
   :     :  +- 'Filter (('t2b > 6) && ('t3b > 't2b))
   :     :     +- 'Aggregate ['t3c, 't3b], ['t3c]
   :     :        +- 'UnresolvedRelation `t3`
   :     +- 'Aggregate ['t2c], [unresolvedalias('min('t2a), None)]
   :        +- 'Filter (('t2a = 't2a) && ('t2c >= 1))
   :           +- 'UnresolvedRelation `t2`
   +- 'UnresolvedRelation `t1`

== Analyzed Logical Plan ==
t1a: int, t1b: int, t1c: int
Project [t1a#764, t1b#765, t1c#766]
+- Filter predicate-subquery#803 [(t1a#764 = min(t2a)#816)]
   :  +- Project [min(t2a)#816]
   :     +- !Filter predicate-subquery#802 [(t2c#781 = t3c#796) && (t2b#780 > 6)
&& (t3b#795 > t2b#780)]
   :        :  +- Project [t3c#796, t3b#795]
   :        :     +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
   :        :        +- SubqueryAlias t3, `t3`
   :        :           +- Project [_1#790 AS t3a#794, _2#791 AS t3b#795, _3#792 AS t3c#796]
   :        :              +- LocalRelation [_1#790, _2#791, _3#792]
   :        +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816, t2c#781]
   :           +- Filter ((t2a#779 = t2a#779) && (t2c#781 >= 1))
   :              +- SubqueryAlias t2, `t2`
   :                 +- Project [_1#775 AS t2a#779, _2#776 AS t2b#780, _3#777 AS t2c#781]
   :                    +- LocalRelation [_1#775, _2#776, _3#777]
   +- SubqueryAlias t1, `t1`
      +- Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
         +- LocalRelation [_1#760, _2#761, _3#762]

== Optimized Logical Plan ==
Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
+- Join LeftSemi, (_1#760 = min(t2a)#816)
   :- LocalRelation [_1#760, _2#761, _3#762]
   +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816]
      +- Project [_1#775 AS t2a#779, _3#777 AS t2c#781]
         +- Join LeftSemi, (((_3#777 = t3c#796) && (_2#776 > 6)) && (t3b#795
> _2#776))
            :- Filter (_3#777 >= 1)
            :  +- LocalRelation [_1#775, _2#776, _3#777]
            +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
               +- LocalRelation [t3b#795, t3c#796]
{code}

I don't see the column t2b#780 being part of the output of the (lower) Aggregate operator.
Somehow the LeftSemi join for t2b > 6 is just moved down below the Aggregate over t2. This
does not look right to me. 

> Invalid correlated column may not be reported as an error
> ---------------------------------------------------------
>
>                 Key: SPARK-19047
>                 URL: https://issues.apache.org/jira/browse/SPARK-19047
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Nattavut Sutyanyong
>
> [subquery/in-subquery/in-group-by.sql TC 01.12]
> {code}
> Seq((1,1,1)).toDF("t1a", "t1b", "t1c").createOrReplaceTempView("t1")
> Seq((1,1,1)).toDF("t2a", "t2b", "t2c").createOrReplaceTempView("t2")
> Seq((1,1,1)).toDF("t3a", "t3b", "t3c").createOrReplaceTempView("t3")
> — TC 01.12
> select * from t1 where t1a in
> (select min(t2a) from t2 where t2a = t2a and t2c >= 1 group by t2c having t2c in
> (select t3c from t3 group by t3c, t3b having t2b > 6 and t3b > t2b ))
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Filter 't1a IN (list#803)
>    :  +- 'Filter 't2c IN (list#802)
>    :     :  +- 'Filter (('t2b > 6) && ('t3b > 't2b))
>    :     :     +- 'Aggregate ['t3c, 't3b], ['t3c]
>    :     :        +- 'UnresolvedRelation `t3`
>    :     +- 'Aggregate ['t2c], [unresolvedalias('min('t2a), None)]
>    :        +- 'Filter (('t2a = 't2a) && ('t2c >= 1))
>    :           +- 'UnresolvedRelation `t2`
>    +- 'UnresolvedRelation `t1`
> == Analyzed Logical Plan ==
> t1a: int, t1b: int, t1c: int
> Project [t1a#764, t1b#765, t1c#766]
> +- Filter predicate-subquery#803 [(t1a#764 = min(t2a)#816)]
>    :  +- Project [min(t2a)#816]
>    :     +- !Filter predicate-subquery#802 [(t2c#781 = t3c#796) && (t2b#780 >
6) && (t3b#795 > t2b#780)]
>    :        :  +- Project [t3c#796, t3b#795]
>    :        :     +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
>    :        :        +- SubqueryAlias t3, `t3`
>    :        :           +- Project [_1#790 AS t3a#794, _2#791 AS t3b#795, _3#792 AS t3c#796]
>    :        :              +- LocalRelation [_1#790, _2#791, _3#792]
>    :        +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816, t2c#781]
>    :           +- Filter ((t2a#779 = t2a#779) && (t2c#781 >= 1))
>    :              +- SubqueryAlias t2, `t2`
>    :                 +- Project [_1#775 AS t2a#779, _2#776 AS t2b#780, _3#777 AS t2c#781]
>    :                    +- LocalRelation [_1#775, _2#776, _3#777]
>    +- SubqueryAlias t1, `t1`
>       +- Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
>          +- LocalRelation [_1#760, _2#761, _3#762]
> == Optimized Logical Plan ==
> Project [_1#760 AS t1a#764, _2#761 AS t1b#765, _3#762 AS t1c#766]
> +- Join LeftSemi, (_1#760 = min(t2a)#816)
>    :- LocalRelation [_1#760, _2#761, _3#762]
>    +- Aggregate [t2c#781], [min(t2a#779) AS min(t2a)#816]
>       +- Project [_1#775 AS t2a#779, _3#777 AS t2c#781]
>          +- Join LeftSemi, (((_3#777 = t3c#796) && (_2#776 > 6)) &&
(t3b#795 > _2#776))
>             :- Filter (_3#777 >= 1)
>             :  +- LocalRelation [_1#775, _2#776, _3#777]
>             +- Aggregate [t3c#796, t3b#795], [t3c#796, t3b#795]
>                +- LocalRelation [t3b#795, t3c#796]
> {code}
> I don't see the column t2b#780 being part of the output of the (lower) Aggregate operator.
Somehow the LeftSemi join for t2b > 6 is just moved down below the Aggregate over t2. This
does not look right to me. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message