kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Barton <garrett.bar...@gmail.com>
Subject Re: Windowed aggregations memory requirements
Date Wed, 03 May 2017 17:51:55 GMT
That depends on if your using event, processing or ingestion time.

My understanding is that if you play a record through that is T-6, the only
way that TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5))
would actually process that record in your window is if your using
processing time.  Otherwise the record is skipped and no data is
generated/calculated for that operation.  So depending on what your doing
you would not increase any more memory usage than when consuming from
real-time.

On Wed, May 3, 2017 at 3:37 AM, João Peixoto <joao.hartimer@gmail.com>
wrote:

> The base question I'm trying to answer is "how much memory does my instance
> need".
>
> Considering a use case where I want to keep a rolling average on a tumbling
> window of 1 minute size allowing for late arrivals up to 5 minutes (lower
> bound) we would have something like this:
>
> TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(
> TimeUnit.MINUTES.toMillis(5))
>
> The aggregate key size is 8 bytes, the average value is 8 bytes and for
> de-duplication purposes we need to keep track of which messages we saw
> already, so a list of keys averaging 10 entries.
>
> If I understand correctly this means that each window will be on average 96
> bytes in size.
>
> A single topic generates 100 messages/minute, which aggregate into 10
> independent windows.
>
> On any given point in time the windowed aggregates require 960 bytes of
> memory at least.
>
> Here's the confusing part. Lets say I found an issue with my averaging
> operation and I want to reprocess the last 10 hours worth of messages.
>
> 1. Windows will be regenerated, since most likely they were cleaned up
> already
> 2. The retention policy now has different semantics? If I had a late
> arrival of 6 minutes, all of the sudden the reprocessing will account for
> it right? Since the window is now active due to recreation (Assuming my app
> is capable of processing all messages under 5 minutes)
> 3. I'll be keeping 10 windows * (60 * 10) minutes for the first 5 minutes,
> so my memory requirement is now 576,000 bytes? This is orders of magnitude
> bigger.
>
> I hope this gets my doubts across clearly, feel free to ask more details.
> And thanks in advance
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message