spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Herman van Hovell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-14986) Spark SQL returns incorrect results for LATERAL VIEW OUTER queries if all inner columns are projected out
Date Wed, 04 May 2016 18:13:12 GMT

    [ https://issues.apache.org/jira/browse/SPARK-14986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271120#comment-15271120
] 

Herman van Hovell commented on SPARK-14986:
-------------------------------------------

I have taken a look at this. Your query yields the following plan:
{noformat}
== Parsed Logical Plan ==
'Project ['nil]
+- 'Generate 'EXPLODE('array()), true, true, Some(n), ['nil]
   +- SubqueryAlias x
      +- Project [1 AS x#0]
         +- OneRowRelation$

== Analyzed Logical Plan ==
nil: null
Project [nil#6]
+- Generate explode(array()), true, true, Some(n), [nil#6]
   +- SubqueryAlias x
      +- Project [1 AS x#0]
         +- OneRowRelation$

== Optimized Logical Plan ==
Generate explode([]), false, true, Some(n), [nil#6]
+- OneRowRelation$

== Physical Plan ==
Generate explode([]), false, true, [nil#6]
+- Scan OneRowRelation[]
{noformat}

The optimizer set the {join} flag to false because no fields from the first relation ({select
1 as x})are used. See: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L365

Setting the join flag to false, triggers a different code path. This code path emits all the
rows in the generated relation for each input row. It does not return any rows if the relation
is empty; which is what you are seeing. The other code path would generate a row because it
performs a left join like operation on the generated results.

This is only a problem for {OUTER} lateral views. We could add the {outer} flag to the optimizer
rule. Does anyone know what the default behavior of Hive is?

> Spark SQL returns incorrect results for LATERAL VIEW OUTER queries if all inner columns
are projected out
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-14986
>                 URL: https://issues.apache.org/jira/browse/SPARK-14986
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Andrey Balmin
>
> Repro:   using Hive context, run this SQL query:
>    select  nil from (select 1 as x ) x LATERAL VIEW OUTER EXPLODE( array ()) n as nil
> Actual result:             returns 0 rows.
> Expected results:      should return 1 row with null value.
> Details:
> If the query is modified to also return x:
>    select x, nil from (select 1 as x ) x LATERAL VIEW OUTER EXPLODE( array ()) n as nil
> it works correctly and returns 1 row: [ 1, null ]
> Clearly, changing Select clause of a query should not change the number of rows it returns.
> Looking at the query plan it seems that the Generator object was (incorrectly) marked
with “join=false"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message