spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Hagerf (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-29324) saveAsTable with overwrite mode results in metadata loss
Date Mon, 07 Oct 2019 08:24:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-29324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945676#comment-16945676
] 

Alexander Hagerf commented on SPARK-29324:
------------------------------------------

I fail to see the point, this is the expected behaviour, you overwrite the table...

> saveAsTable with overwrite mode results in metadata loss
> --------------------------------------------------------
>
>                 Key: SPARK-29324
>                 URL: https://issues.apache.org/jira/browse/SPARK-29324
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Karuppayya
>            Priority: Major
>
> {code:java}
> scala> spark.range(1).write.option("path", "file:///tmp/tbl").format("orc").saveAsTable("tbl")
> scala> spark.sql("desc extended tbl").collect.foreach(println)
> [id,bigint,null]
> [,,]
> [# Detailed Table Information,,]
> [Database,default,]
> [Table,tbl,]
> [Owner,karuppayyar,]
> [Created Time,Wed Oct 02 09:29:06 IST 2019,]
> [Last Access,UNKNOWN,]
> [Created By,Spark 3.0.0-SNAPSHOT,]
> [Type,EXTERNAL,]
> [Provider,orc,]
> [Location,file:/tmp/tbl_loc,]
> [Serde Library,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
> [InputFormat,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
> [OutputFormat,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]
> {code}
> {code:java}
> // Overwriting table
> scala> spark.range(100).write.mode("overwrite").saveAsTable("tbl")
> scala> spark.sql("desc extended tbl").collect.foreach(println)
> [id,bigint,null]
> [,,]
> [# Detailed Table Information,,]
> [Database,default,]
> [Table,tbl,]
> [Owner,karuppayyar,]
> [Created Time,Wed Oct 02 09:30:36 IST 2019,]
> [Last Access,UNKNOWN,]
> [Created By,Spark 3.0.0-SNAPSHOT,]
> [Type,MANAGED,]
> [Provider,parquet,]
> [Location,file:/tmp/tbl,]
> [Serde Library,org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,]
> [InputFormat,org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,]
> [OutputFormat,org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,]
> {code}
>  
>  
> The first code block creates an EXTERNAL table in Orc format
> The second code block overwrites it with more data
> After the overwrite,
> 1. The external table became a managed table.
> 2. The  fileformat has changed from Orc to parquet(default fileformat).
> And other information(like owner, TBLPROPERTIES) are also overwritten.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message