spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Shah <>
Subject [Pyspark 2.4] not able to partition the data frame by dates
Date Thu, 01 Aug 2019 02:55:04 GMT
Hi All,

I have a dataframe of size 2.7T (parquet) which I need to partition by
date, however below spark program doesn't help - keeps failing due to *file
already exists exception..*

df =

I did notice that couple of tasks failed and probably that's why it tried
spinning up new ones which write to the same .staging directory?


Rishi Shah

View raw message