kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From João Peixoto <joao.harti...@gmail.com>
Subject Windowed aggregations memory requirements
Date Wed, 03 May 2017 07:37:11 GMT
The base question I'm trying to answer is "how much memory does my instance
need".

Considering a use case where I want to keep a rolling average on a tumbling
window of 1 minute size allowing for late arrivals up to 5 minutes (lower
bound) we would have something like this:

TimeWindows.of(TimeUnit.MINUTES.toMillis(1)).until(TimeUnit.MINUTES.toMillis(5))

The aggregate key size is 8 bytes, the average value is 8 bytes and for
de-duplication purposes we need to keep track of which messages we saw
already, so a list of keys averaging 10 entries.

If I understand correctly this means that each window will be on average 96
bytes in size.

A single topic generates 100 messages/minute, which aggregate into 10
independent windows.

On any given point in time the windowed aggregates require 960 bytes of
memory at least.

Here's the confusing part. Lets say I found an issue with my averaging
operation and I want to reprocess the last 10 hours worth of messages.

1. Windows will be regenerated, since most likely they were cleaned up
already
2. The retention policy now has different semantics? If I had a late
arrival of 6 minutes, all of the sudden the reprocessing will account for
it right? Since the window is now active due to recreation (Assuming my app
is capable of processing all messages under 5 minutes)
3. I'll be keeping 10 windows * (60 * 10) minutes for the first 5 minutes,
so my memory requirement is now 576,000 bytes? This is orders of magnitude
bigger.

I hope this gets my doubts across clearly, feel free to ask more details.
And thanks in advance

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message