kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Processor API, how to get last N hours word count
Date Wed, 04 Jul 2018 18:45:49 GMT
Hello Gleb,

For the first question, you should use a windowed store in your topology

And for the first / second question, I think using interactive query should
be fine, i.e. you can create different windows for different length, and
use interactive queries to get the count on different window spans.


On Wed, Jul 4, 2018 at 8:34 AM, Gleb Stsenov <gleb.stsenov@gmail.com> wrote:

> Hello,
> Just started with Kafka, took 2.0 because it has better unit test support.
> Built custom processor, which is basically same as WordCountProcessorDemo
> example (github https://goo.gl/XSh7iW ) and can be treated as equal to
> that.
> Built topology by adding source topic, processor, state store, and sink (to
> topic).
> Played with KafkaTool and console consumers, can see my wordcounts since
> beginning of publisher life.
> *Question*:
> what would be the correct way to get wordcounts for last 24h?
> On topology creation, on processor init, or somehow related to "interactive
> queries" feature?
> First goal is answering to "what are the most common words during last 24h
> (having word count > configured_X)".
> Second goal is custom time window. Kinda if word "user" is the most common
> in 24h, I want to know its word count for last 36h, or so. Sounds like
> "interactive query" for me, but not sure.
> Read description of window types in doc, but can't get the idea of applying
> them to processor API.
> Thank you.
> BR,
> Gleb.

-- Guozhang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message