spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liang-Chi Hsieh <vii...@gmail.com>
Subject Re: approx_percentile computation
Date Thu, 02 Feb 2017 06:48:24 GMT

Hi,

You don't need to run approxPercentile against a list. Since it is an
aggregation function, you can simply run:

// Just for illustrate the idea.
val approxPercentile = new ApproximatePercentile(v1, Literal(percentage))
val agg_approx_percentile = Column(approxPercentile.toAggregateExpression())

df.groupBy (k1, k2, k3).agg(collect_list(v1), agg_approx_percentile)



Rishi wrote
> I need to compute have a spark quantiles on a numeric field after a group
> by operation. Is there a way to apply the approxPercentile on an
> aggregated list instead of a column?
> 
> E.g. The Dataframe looks like
> 
> k1 | k2 | k3 | v1
> 
> a1 | b1 | c1 | 879
> 
> a2 | b2 | c2 | 769
> 
> a1 | b1 | c1 | 129
> 
> a2 | b2 | c2 | 323
> I need to first run groupBy (k1, k2, k3) and collect_list(v1), and then
> compute quantiles [10th, 50th...] on list of v1's





-----
Liang-Chi Hsieh | @viirya 
Spark Technology Center 
http://www.spark.tc/ 
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/approx-percentile-computation-tp20820p20823.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message