spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nirandap <niranda.per...@gmail.com>
Subject Re: Spark deletes all existing partitions in SaveMode.Overwrite - Expected behavior ?
Date Thu, 07 Jul 2016 04:00:49 GMT
Hi Yash,

Yes, AFAIK, that is the expected behavior of the Overwrite mode.

I think you can use the following approaches if you want to perform a job
on each partitions
[1] for each partition in DF :
https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1444
[2] run job in SC:
https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/SparkContext.scala#L1818

Best

On Thu, Jul 7, 2016 at 7:40 AM, Yash Sharma [via Apache Spark Developers
List] <ml-node+s1001551n18219h51@n3.nabble.com> wrote:

> Hi All,
> While writing a partitioned data frame as partitioned text files I see
> that Spark deletes all available partitions while writing few new
> partitions.
>
> dataDF.write.partitionBy(“year”, “month”,
>> “date”).mode(SaveMode.Overwrite).text(“s3://data/test2/events/”)
>
>
> Is this an expected behavior ?
>
> I have a past correction job which would overwrite couple of past
> partitions based on new arriving data. I would only want to remove those
> partitions.
>
> Is there a neater way to do that other than:
> - Find the partitions
> - Delete using Hadoop API's
> - Write DF in Append Mode
>
>
> Cheers
> Yash
>
>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-deletes-all-existing-partitions-in-SaveMode-Overwrite-Expected-behavior-tp18219.html
> To start a new topic under Apache Spark Developers List, email
> ml-node+s1001551n1h93@n3.nabble.com
> To unsubscribe from Apache Spark Developers List, click here
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bmlyYW5kYS5wZXJlcmFAZ21haWwuY29tfDF8NjAxMDUyMzU5>
> .
> NAML
> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Niranda
@n1r44 <https://twitter.com/N1R44>
+94-71-554-8430
https://pythagoreanscript.wordpress.com/




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-deletes-all-existing-partitions-in-SaveMode-Overwrite-Expected-behavior-tp18219p18220.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Mime
View raw message