spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RK Aduri <>
Subject Re: Spark SQL overwrite/append for partitioned tables
Date Mon, 25 Jul 2016 23:23:41 GMT
You can have a temporary file to capture the data that you would like to overwrite. And swap
that with existing partition that you would want to wipe the data away. Swapping can be done
by simple rename of the partition and just repair the table to pick up the new partition.

Am not sure if that addresses your scenario.

> On Jul 25, 2016, at 4:18 PM, Pedro Rodriguez <> wrote:
> What would be the best way to accomplish the following behavior:
> 1. There is a table which is partitioned by date
> 2. Spark job runs on a particular date, we would like it to wipe out all data for that
date. This is to make the job idempotent and lets us rerun a job if it failed without fear
of duplicated data
> 3. Preserve data for all other dates
> I am guessing that overwrite would not work here or if it does its not guaranteed to
stay that way, but am not sure. If thats the case, is there a good/robust way to get this
> -- 
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
> <> | <>
| 909-353-4423
> Github: <> | LinkedIn:

Collective[i] dramatically improves sales and marketing performance using 
technology, applications and a revolutionary network designed to provide 
next generation analytics and decision-support directly to business users. 
Our goal is to maximize human potential and minimize mistakes. In most 
cases, the results are astounding. We cannot, however, stop emails from 
sometimes being sent to the wrong person. If you are not the intended 
recipient, please notify us by replying to this email's sender and deleting 
it (and any attachments) permanently from your system. If you are, please 
respect the confidentiality of this communication's contents.

View raw message