spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karuppayya (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-29324) saveAsTable with overwrite mode results in metadata loss
Date Wed, 02 Oct 2019 04:55:00 GMT
Karuppayya created SPARK-29324:
----------------------------------

             Summary: saveAsTable with overwrite mode results in metadata loss
                 Key: SPARK-29324
                 URL: https://issues.apache.org/jira/browse/SPARK-29324
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Karuppayya


{code:java}

scala> spark.range(1).write.option("path", "file:///tmp/tbl").format("orc").saveAsTable("tbl")

scala> spark.sql("desc extended tbl").collect.foreach(println)
[id,bigint,null]
[,,]
[# Detailed Table Information,,]
[Database,default,]
[Table,tbl,]
[Owner,karuppayyar,]
[Created Time,Wed Oct 02 09:29:06 IST 2019,]
[Last Access,UNKNOWN,]
[Created By,Spark 3.0.0-SNAPSHOT,]
[Type,EXTERNAL,]
[Provider,orc,]
[Location,file:/tmp/tbl_loc,]
[Serde Library,org.apache.hadoop.hive.ql.io.orc.OrcSerde,]
[InputFormat,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,]
[OutputFormat,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,]

{code}
{code:java}
// Overwriting table
scala> spark.range(100).write.mode("overwrite").saveAsTable("tbl")

scala> spark.sql("desc extended tbl").collect.foreach(println)
[id,bigint,null]
[,,]
[# Detailed Table Information,,]
[Database,default,]
[Table,tbl,]
[Owner,karuppayyar,]
[Created Time,Wed Oct 02 09:30:36 IST 2019,]
[Last Access,UNKNOWN,]
[Created By,Spark 3.0.0-SNAPSHOT,]
[Type,MANAGED,]
[Provider,parquet,]
[Location,file:/tmp/tbl,]
[Serde Library,org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe,]
[InputFormat,org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,]
[OutputFormat,org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat,]
{code}
 

 

The first code block creates an EXTERNAL table in Orc format

The second code block overwrites it with more data

After the overwrite,

1. The external table became a managed table.

2. The  fileformat has changed from Orc to parquet(default fileformat).

And other information(like owner, TBLPROPERTIES) are also overwritten.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message