I couldn’t reproduce this with the following spark-shell snippet:
scala> import sqlContext.implicits._ scala> Seq((1, 2)).toDF("a", "b") scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite) scala> res0.save("xxx", org.apache.spark.sql.SaveMode.Overwrite)
The _common_metadata file is typically much smaller than _metadata, because it doesn’t contain row group information, and thus can be faster to read than _metadata.
On 3/26/15 12:48 PM, Pei-Lun Lee wrote:
When I save parquet file with SaveMode.Overwrite, it never generate _common_metadata. Whether it overwrites an existing dir or not.Is this expected behavior?And what is the benefit of _common_metadata? Will reading performs better when it is present?