spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From N B <nb.nos...@gmail.com>
Subject Re: Counting distinct values for a key?
Date Mon, 20 Jul 2015 00:46:09 GMT
Hi Jerry,

It does not work directly for 2 reasons:

1. I am trying to do this using Spark Streaming (Window DStreams) and
DataFrames API does not work with Streaming yet.

2. The query equivalent has a "distinct" embedded in it i.e. I am looking
to achieve the equivalent of

SELECT key, count(distinct(value)) from table group by key

Thanks
Nikunj


On Sun, Jul 19, 2015 at 2:28 PM, Jerry Lam <chilinglam@gmail.com> wrote:

> You mean this does not work?
>
> SELECT key, count(value) from table group by key
>
>
>
> On Sun, Jul 19, 2015 at 2:28 PM, N B <nb.nospam@gmail.com> wrote:
>
>> Hello,
>>
>> How do I go about performing the equivalent of the following SQL clause
>> in Spark Streaming? I will be using this on a Windowed DStream.
>>
>> SELECT key, count(distinct(value)) from table group by key;
>>
>> so for example, given the following dataset in the table:
>>
>>  key | value
>> -----+-------
>>  k1  | v1
>>  k1  | v1
>>  k1  | v2
>>  k1  | v3
>>  k1  | v3
>>  k2  | vv1
>>  k2  | vv1
>>  k2  | vv2
>>  k2  | vv2
>>  k2  | vv2
>>  k3  | vvv1
>>  k3  | vvv1
>>
>> the result will be:
>>
>>  key | count
>> -----+-------
>>  k1  |     3
>>  k2  |     2
>>  k3  |     1
>>
>> Thanks
>> Nikunj
>>
>>
>

Mime
View raw message