spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: SparkSQL overwrite parquet file does not generate _common_metadata
Date Thu, 26 Mar 2015 11:26:56 GMT
I couldn’t reproduce this with the following spark-shell snippet:

|scala> import sqlContext.implicits._
scala> Seq((1, 2)).toDF("a", "b")
scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite)
scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite)
|

The _common_metadata file is typically much smaller than _metadata, 
because it doesn’t contain row group information, and thus can be faster 
to read than _metadata.

Cheng

On 3/26/15 12:48 PM, Pei-Lun Lee wrote:

> Hi,
>
> When I save parquet file with SaveMode.Overwrite, it never generate 
> _common_metadata. Whether it overwrites an existing dir or not.
> Is this expected behavior?
> And what is the benefit of _common_metadata? Will reading performs 
> better when it is present?
>
> Thanks,
> --
> Pei-Lun

​

Mime
View raw message