spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-26707) Insert into table with single struct column fails
Date Thu, 07 Feb 2019 18:31:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-26707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-26707:
------------------------------------

    Assignee: Apache Spark

> Insert into table with single struct column fails
> -------------------------------------------------
>
>                 Key: SPARK-26707
>                 URL: https://issues.apache.org/jira/browse/SPARK-26707
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.3, 2.3.2, 2.4.0, 3.0.0
>            Reporter: Bruce Robbins
>            Assignee: Apache Spark
>            Priority: Minor
>
> This works:
> {noformat}
> scala> sql("select named_struct('d1', 123) c1, 12 c2").write.format("parquet").saveAsTable("structtbl2")
> scala> sql("show create table structtbl2").show(truncate=false)
> +---------------------------------------------------------------------------+
> |createtab_stmt                                                             |
> +---------------------------------------------------------------------------+
> |CREATE TABLE `structtbl2` (`c1` STRUCT<`d1`: INT>, `c2` INT)
> USING parquet
> |
> +---------------------------------------------------------------------------+
> scala> sql("insert into structtbl2 values (struct(789), 17)")
> res2: org.apache.spark.sql.DataFrame = []
> scala> sql("select * from structtbl2").show
> +-----+---+
> |   c1| c2|
> +-----+---+
> |[789]| 17|
> |[123]| 12|
> +-----+---+
> scala>
> {noformat}
> However, if the table's only column is the struct column, the insert does not work:
> {noformat}
> scala> sql("select named_struct('d1', 123) c1").write.format("parquet").saveAsTable("structtbl1")
> scala> sql("show create table structtbl1").show(truncate=false)
> +-----------------------------------------------------------------+
> |createtab_stmt                                                   |
> +-----------------------------------------------------------------+
> |CREATE TABLE `structtbl1` (`c1` STRUCT<`d1`: INT>)
> USING parquet
> |
> +-----------------------------------------------------------------+
> scala> sql("insert into structtbl1 values (struct(789))")
> org.apache.spark.sql.AnalysisException: cannot resolve '`col1`' due to data type mismatch:
cannot cast int to struct<d1:int>;;
> 'InsertIntoHadoopFsRelationCommand file:/Users/brobbins/github/spark_upstream/spark-warehouse/structtbl1,
false, Parquet, Map(path -> file:/Users/brobbins/github/spark_upstream/spark-warehouse/structtbl1),
Append, CatalogTable(
> ...etc...
> {noformat}
> I can work around it by using a named_struct as the value:
> {noformat}
> scala> sql("insert into structtbl1 values (named_struct('d1',789))")
> res7: org.apache.spark.sql.DataFrame = []
> scala> sql("select * from structtbl1").show
> +-----+
> |   c1|
> +-----+
> |[789]|
> |[123]|
> +-----+
> scala>
> {noformat}
> My guess is that I just don't understand how structs work. But maybe this is a bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message