spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry Lam <chiling...@gmail.com>
Subject Re: Counting distinct values for a key?
Date Sun, 19 Jul 2015 21:28:28 GMT
You mean this does not work?

SELECT key, count(value) from table group by key



On Sun, Jul 19, 2015 at 2:28 PM, N B <nb.nospam@gmail.com> wrote:

> Hello,
>
> How do I go about performing the equivalent of the following SQL clause in
> Spark Streaming? I will be using this on a Windowed DStream.
>
> SELECT key, count(distinct(value)) from table group by key;
>
> so for example, given the following dataset in the table:
>
>  key | value
> -----+-------
>  k1  | v1
>  k1  | v1
>  k1  | v2
>  k1  | v3
>  k1  | v3
>  k2  | vv1
>  k2  | vv1
>  k2  | vv2
>  k2  | vv2
>  k2  | vv2
>  k3  | vvv1
>  k3  | vvv1
>
> the result will be:
>
>  key | count
> -----+-------
>  k1  |     3
>  k2  |     2
>  k3  |     1
>
> Thanks
> Nikunj
>
>

Mime
View raw message