spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianshi Huang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-6201) INSET should coerce types
Date Mon, 09 Mar 2015 17:39:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353248#comment-14353248
] 

Jianshi Huang commented on SPARK-6201:
--------------------------------------

Implicit coercion outside the Numeric domain is quite evil. I don't think Hive's behavior
makes sense here. 

Raising an exception is fine in this case. And if you want to make it Hive compliant, then
pls think about adding an switch, say

  spark.sql.strict_mode = true(default) / false

Jianshi

> INSET should coerce types
> -------------------------
>
>                 Key: SPARK-6201
>                 URL: https://issues.apache.org/jira/browse/SPARK-6201
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0, 1.3.0, 1.2.1
>            Reporter: Jianshi Huang
>
> Suppose we have the following table:
> {code}
> sqlc.jsonRDD(sc.parallelize(Seq("{\"a\": \"1\"}}", "{\"a\": \"2\"}}", "{\"a\": \"3\"}}"))).registerTempTable("d")
> {code}
> The schema is
> {noformat}
> root
>  |-- a: string (nullable = true)
> {noformat}
> Then,
> {code}
> sql("select * from d where (d.a = 1 or d.a = 2)").collect
> =>
> Array([1], [2])
> {code}
> where d.a and constants 1,2 will be casted to Double first and do the comparison as you
can find it out in the plan:
> {noformat}
> Filter ((CAST(a#155, DoubleType) = CAST(1, DoubleType)) || (CAST(a#155, DoubleType) =
CAST(2, DoubleType)))
> {noformat}
> However, if I use
> {code}
> sql("select * from d where d.a in (1,2)").collect
> {code}
> The result is empty.
> The physical plan shows it's using INSET:
> {noformat}
> == Physical Plan ==
> Filter a#155 INSET (1,2)
>  PhysicalRDD [a#155], MappedRDD[499] at map at JsonRDD.scala:47
> {noformat}
> *It seems INSET implementation in SparkSQL doesn't coerce type implicitly, where Hive
does. We should make SparkSQL conform to Hive's behavior, even though doing implicit coercion
here is very confusing for comparing String and Int.*
> Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message