spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tzahi File <tzahi.f...@ironsrc.com>
Subject Re: Getting PySpark Partitions Locations
Date Thu, 25 Jun 2020 13:18:47 GMT
I don't want to query with a distinct on the partitioned columns, the df
contains over 1 Billion of records.
I just want to know the partitions that were created..

On Thu, Jun 25, 2020 at 4:04 PM Jörn Franke <jornfranke@gmail.com> wrote:

> By doing a select on the df ?
>
> Am 25.06.2020 um 14:52 schrieb Tzahi File <tzahi.file@ironsrc.com>:
>
> 
> Hi,
>
> I'm using pyspark to write df to s3, using the following command:
> "df.write.partitionBy("day","hour","country").mode("overwrite").parquet(s3_output)".
>
> Is there any way to get the partitions created?
> e.g.
> day=2020-06-20/hour=1/country=US
> day=2020-06-20/hour=2/country=US
> ......
>
> --
> Tzahi File
> Data Engineer
> [image: ironSource] <http://www.ironsrc.com/>
>
> email tzahi.file@ironsrc.com
> mobile +972-546864835
> fax +972-77-5448273
> ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv
> ironsrc.com <http://www.ironsrc.com/>
> [image: linkedin] <https://www.linkedin.com/company/ironsource>[image:
> twitter] <https://twitter.com/ironsource>[image: facebook]
> <https://www.facebook.com/ironSource>[image: googleplus]
> <https://plus.google.com/+ironsrc>
> This email (including any attachments) is for the sole use of the intended
> recipient and may contain confidential information which may be protected
> by legal privilege. If you are not the intended recipient, or the employee
> or agent responsible for delivering it to the intended recipient, you are
> hereby notified that any use, dissemination, distribution or copying of
> this communication and/or its content is strictly prohibited. If you are
> not the intended recipient, please immediately notify us by reply email or
> by telephone, delete this email and destroy any copies. Thank you.
>
>

-- 
Tzahi File
Data Engineer
[image: ironSource] <http://www.ironsrc.com/>

email tzahi.file@ironsrc.com
mobile +972-546864835
fax +972-77-5448273
ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv
ironsrc.com <http://www.ironsrc.com/>
[image: linkedin] <https://www.linkedin.com/company/ironsource>[image:
twitter] <https://twitter.com/ironsource>[image: facebook]
<https://www.facebook.com/ironSource>[image: googleplus]
<https://plus.google.com/+ironsrc>
This email (including any attachments) is for the sole use of the intended
recipient and may contain confidential information which may be protected
by legal privilege. If you are not the intended recipient, or the employee
or agent responsible for delivering it to the intended recipient, you are
hereby notified that any use, dissemination, distribution or copying of
this communication and/or its content is strictly prohibited. If you are
not the intended recipient, please immediately notify us by reply email or
by telephone, delete this email and destroy any copies. Thank you.

Mime
View raw message