spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From N B <nb.nos...@gmail.com>
Subject Counting distinct values for a key?
Date Sun, 19 Jul 2015 18:28:31 GMT
Hello,

How do I go about performing the equivalent of the following SQL clause in
Spark Streaming? I will be using this on a Windowed DStream.

SELECT key, count(distinct(value)) from table group by key;

so for example, given the following dataset in the table:

 key | value
-----+-------
 k1  | v1
 k1  | v1
 k1  | v2
 k1  | v3
 k1  | v3
 k2  | vv1
 k2  | vv1
 k2  | vv2
 k2  | vv2
 k2  | vv2
 k3  | vvv1
 k3  | vvv1

the result will be:

 key | count
-----+-------
 k1  |     3
 k2  |     2
 k3  |     1

Thanks
Nikunj

Mime
View raw message