spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Vinokurov <>
Subject Re: Using Percentile in Spark SQL
Date Mon, 11 Nov 2019 15:13:56 GMT
Do you mean that you are trying to compute the percent rank of some data?
You can use the SparkSQL percent_rank function for that, but I don't think
that's going to give you any improvement over calling the percentRank
function on the data frame. Are you currently using a user-defined function
for this task? Because I bet that's what's slowing you down.

On Mon, Nov 11, 2019 at 9:46 AM Tzahi File <> wrote:

> Hi,
> Currently, I'm using hive huge cluster(m5.24xl * 40 workers) to run a
> percentile function. I'm trying to improve this job by moving it to run
> with spark SQL.
> Any suggestions on how to use a percentile function in Spark?
> Thanks,
> --
> Tzahi File
> Data Engineer
> [image: ironSource] <>
> email
> mobile +972-546864835
> fax +972-77-5448273
> ironSource HQ - 121 Derech Menachem Begin st. Tel Aviv
> <>
> [image: linkedin] <>[image:
> twitter] <>[image: facebook]
> <>[image: googleplus]
> <>
> This email (including any attachments) is for the sole use of the intended
> recipient and may contain confidential information which may be protected
> by legal privilege. If you are not the intended recipient, or the employee
> or agent responsible for delivering it to the intended recipient, you are
> hereby notified that any use, dissemination, distribution or copying of
> this communication and/or its content is strictly prohibited. If you are
> not the intended recipient, please immediately notify us by reply email or
> by telephone, delete this email and destroy any copies. Thank you.


View raw message