spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Staple (JIRA)" <>
Subject [jira] [Created] (SPARK-2781) Analyzer should check resolution of LogicalPlans
Date Fri, 01 Aug 2014 02:08:38 GMT
Aaron Staple created SPARK-2781:

             Summary: Analyzer should check resolution of LogicalPlans
                 Key: SPARK-2781
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Aaron Staple

Currently the Analyzer’s CheckResolution rule checks that all attributes are resolved by
searching for unresolved Expressions.  But some LogicalPlans, including Union, contain custom
implementations of the resolve attribute that validate other criteria in addition to checking
for attribute resolution of their descendants.  These LogicalPlans are not currently validated
by the CheckResolution implementation.

As a result, it is currently possible to execute a query generated from unresolved LogicalPlans.
 One example is a UNION query that produces rows with different data types in the same column:

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
case class T1(value:Seq[Int])
val t1 = sc.parallelize(Seq(T1(Seq(0,1))))
sqlContext.sql("SELECT value FROM t1 UNION SELECT 2 FROM t1”).collect()

In this example, the type coercion implementation cannot unify array and integer types.  One
row contains an array in the returned column and the other row contains an integer.  The result

res3: Array[org.apache.spark.sql.Row] = Array([List(0, 1)], [2])

I believe fixing this is a first step toward improving validation for Union (and similar)
plans.  (For instance, Union does not currently validate that its children contain the same
number of columns.)

This message was sent by Atlassian JIRA

View raw message