spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <>
Subject [jira] [Commented] (SPARK-25313) Fix regression in FileFormatWriter output schema
Date Mon, 03 Sep 2018 07:20:00 GMT


Apache Spark commented on SPARK-25313:

User 'gengliangwang' has created a pull request for this issue:

> Fix regression in FileFormatWriter output schema
> ------------------------------------------------
>                 Key: SPARK-25313
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Gengliang Wang
>            Priority: Major
> In the follow example:
>         val location = "/tmp/t"
>         val df = spark.range(10).toDF("id")
>         df.write.format("parquet").saveAsTable("tbl")
>         spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
>         spark.sql(s"CREATE TABLE tbl2(ID long) USING parquet location $location")
>         spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
>         println(
>         spark.table("tbl2").show()
> The output column name in schema will be id instead of ID, thus the last query shows
nothing from tbl2.
> By enabling the debug message we can see that the output naming is changed from ID to
id, and then the outputColumns in InsertIntoHadoopFsRelationCommand is changed in RemoveRedundantAliases.
> To guarantee correctness, we should change the output columns from `Seq[Attribute]` to
`Seq[String]` to avoid its names being replaced by optimizer.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message