spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: distributed computation of median
Date Mon, 17 Apr 2017 16:40:07 GMT
Also q-tree is implemented in algebird, not hard to get it going in spark.
That is another probabilistic data structure that is useful for this.

On Apr 17, 2017 11:27, "Jason White" <jason.white@shopify.com> wrote:

> Have you looked at t-digests?
>
> Calculating percentiles (including medians) is something that is inherently
> difficult/inefficient to do in a distributed system. T-digests provide a
> useful probabilistic structure to allow you to compute any percentile with
> a
> known (and tunable) margin of error.
>
> https://github.com/tdunning/t-digest
>
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/distributed-computation-of-median-
> tp21356p21357.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message