spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Getting PySpark Partitions Locations
Date Thu, 25 Jun 2020 13:04:06 GMT
By doing a select on the df ?

> Am 25.06.2020 um 14:52 schrieb Tzahi File <tzahi.file@ironsrc.com>:
> 
> 
> Hi,
> 
> I'm using pyspark to write df to s3, using the following command: "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)".
> 
> Is there any way to get the partitions created?
> e.g. 
> day=2020-06-20/hour=1/country=US
> day=2020-06-20/hour=2/country=US
> ......
> 
> -- 
> Tzahi File
> Data Engineer
> 
> email tzahi.file@ironsrc.com
> mobile +972-546864835
> fax +972-77-5448273
> ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv
> ironsrc.com
> 
> This email (including any attachments) is for the sole use of the intended recipient
and may contain confidential information which may be protected by legal privilege. If you
are not the intended recipient, or the employee or agent responsible for delivering it to
the intended recipient, you are hereby notified that any use, dissemination, distribution
or copying of this communication and/or its content is strictly prohibited. If you are not
the intended recipient, please immediately notify us by reply email or by telephone, delete
this email and destroy any copies. Thank you.

Mime
View raw message