spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romi Kuntsman <r...@totango.com>
Subject Re: How to overwrite partition when writing Parquet?
Date Thu, 20 Aug 2015 10:45:23 GMT
Cheng - what if I want to overwrite a specific partition?

I'll to remove the folder, as Hemant suggested...

On Thu, Aug 20, 2015 at 1:17 PM Cheng Lian <lian.cs.zju@gmail.com> wrote:

> You can apply a filter first to filter out data of needed dates and then
> append them.
>
>
> Cheng
>
>
> On 8/20/15 4:59 PM, Hemant Bhanawat wrote:
>
> How can I overwrite only a given partition or manually remove a partition
> before writing?
>
> I don't know if (and I don't think)  there is a way to do that using a
> mode. But doesn't manually deleting the directory of a particular partition
> help? For directory structure, check this out...
>
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery
>
>
> On Wed, Aug 19, 2015 at 8:18 PM, Romi Kuntsman <romi@totango.com> wrote:
>
>> Hello,
>>
>> I have a DataFrame, with a date column which I want to use as a partition.
>> Each day I want to write the data for the same date in Parquet, and then
>> read a dataframe for a date range.
>>
>> I'm using:
>>
>> myDataframe.write().partitionBy("date").mode(SaveMode.Overwrite).parquet(parquetDir);
>>
>> If I use SaveMode.Append, then writing data for the same partition adds
>> the same data there again.
>> If I use SaveMode.Overwrite, then writing data for a single partition
>> removes all the data for all partitions.
>>
>> How can I overwrite only a given partition or manually remove a partition
>> before writing?
>>
>> Many thanks!
>> Romi K.
>>
>
>
>

Mime
View raw message