spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yash Sharma <yash...@gmail.com>
Subject Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?
Date Thu, 07 Jul 2016 02:09:27 GMT
Hi All,
While writing a partitioned data frame as partitioned text files I see that
Spark deletes all available partitions while writing few new partitions.

dataDF.write.partitionBy(“year”, “month”,
> “date”).mode(SaveMode.Overwrite).text(“s3://data/test2/events/”)


Is this an expected behavior ?

I have a past correction job which would overwrite couple of past
partitions based on new arriving data. I would only want to remove those
partitions.

Is there a neater way to do that other than:
- Find the partitions
- Delete using Hadoop API's
- Write DF in Append Mode


Cheers
Yash

Mime
View raw message