kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bruno Cadonna <br...@confluent.io>
Subject Re: Unique users per calendar month using kafka streams
Date Thu, 21 Nov 2019 11:37:14 GMT
Hi Chintan,

You cannot specify time windows based on a calendar object like months.

In the following, I suppose the keys of your records are user IDs. You
could extract the months from the timestamps of the events and add
them to the key of your records. Then you can group the records by key
and count them. Be aware that your state that stores the counts will
grow indefinitely and therefore you need to take care how to remove
counts you do not need anymore from your local state.

Take a look at the following example of how to deduplicate records

https://github.com/confluentinc/kafka-streams-examples/blob/5.3.1-post/src/test/java/io/confluent/examples/streams/EventDeduplicationLambdaIntegrationTest.java

It shows how to avoid indefinite growing of local store in such cases.
Try to adapt it to solve your problem by extending the key with the
month and computing the count instead of looking for duplicates.

Best,
Bruno

On Thu, Nov 21, 2019 at 10:28 AM chintan mavawala
<chintan25487@gmail.com> wrote:
>
> Hi,
>
> We have a use case to capture number of unique users per month. We planned
> to use windowing concept for this.
>
> For example, group events from input topic by user name and later sub group
> them based on time window. However i don't see how i can sub group the
> results based on particular month, say January. The only way is sub group
> based on time.
>
> Any pointers would be appreciated.
>
> Regards,
> Chintan

Mime
View raw message