kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hamza HACHANI <hamza.hach...@supcom.tn>
Subject RE: Kafka streams Issue
Date Fri, 29 Jul 2016 14:53:04 GMT
Hi Dave,

Could you explain a little bit much your idea ?
I can't figure out what you are suggesting.
Thank you

De : Tauzell, Dave <Dave.Tauzell@surescripts.com>
Envoyé : vendredi 29 juillet 2016 02:39:53
À : users@kafka.apache.org
Objet : RE: Kafka streams Issue

You could send the message immediately to an intermediary topic.  Then have a consumer of
that topic that pull messages off and waits until the minute is up.


Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com<http://www.surescripts.com> |   Dave.Tauzell@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube

-----Original Message-----
From: Hamza HACHANI [mailto:hamza.hachani@supcom.tn]
Sent: Friday, July 29, 2016 9:36 AM
To: users@kafka.apache.org
Subject: Kafka streams Issue

> Good morning,
> I'm an ICT student in TELECOM BRRETAGNE (a french school).
> I did follow your presentation in Youtube and i found them really
> intresting.
> I'm trying to do some stuffs with Kafka. And now it has been  about 3
> days that I'm blocked.
> I'm trying to control the time in which my processing application send
> data to the output topic .
> What i'm trying to do is to make the application process data from the
> input topic all the time but send the messages only at the end of a
> minute/an hour/a month .... (the notion of windowing).
> For the moment what i managed to do is that the application instead of
> sending data only at the end of the minute,it send it anytime it does
> receive it from the input topic.
> Have you any suggestions to help me?
> I would be really gratfeul.

Preliminary answer for now:

> For the moment what i managed to do is that the application instead of
sending data only at the end
> of the minute,it send it anytime it does receive it from the input topic.

This is actually the expected behavior at the moment.

The main reason for this behavior is that, in stream processing, we never know whether there
is still late-arriving data to be received.  For example, imagine you have 1-minute windows
based on event-time.  Here, it may happen that, after the first 1 minute window has passed,
another record arrives five minutes later but, according to the record's event-time, it should
have still been part of the first 1-minute window.  In this case, what we typically want to
happen is that the first 1-window will be updated/reprocessed with the late-arriving record
included.  In other words, just because 1 minute has passed (= the 1-minute window is "done")
it does not mean that actually all the data for that time interval has been processed already
-- so sending only a single update after 1 minute has passed would even produce incorrect
results in many cases.  For this reason you currently see a downstream update anytime there
is a new incoming data record ("send it anytime it does receive it from the input topic").
 So the point here is due ensure correctness of processing.

That said, one known drawback of the current behavior is that users haven't been able to control
(read: decrease/reduce) the rate/volume of the resulting downstream updates.  For example,
if you have an input topic with a rate of 1 million msg/s (which is easy for Kafka), some
users want to aggregate/window results primarily to reduce the input rate to a lower numbers
(e.g. 1 thousand msg/s) so that the data can be fed from Kafka to other systems that might
not scale as well as Kafka.  To help these use cases we will have a new configuration parameter
in the next major version of Kafka that allows you to control the rate/volume of downstream
Here, the point is to help users optimize resource usage rather than correctness of processing.
 This new parameter should also help you with your use case.  But even this new parameter
is not based on strict time behavior or time windows.

This e-mail and any files transmitted with it are confidential, may contain sensitive information,
and are intended solely for the use of the individual or entity to whom they are addressed.
If you have received this e-mail in error, please notify the sender by reply e-mail immediately
and destroy all copies of the e-mail and any attachments.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message