kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Fouché <nfou...@onfocus.io>
Subject Kafka Streams: got bit by WindowedSerializer (only window.start is serialized)
Date Mon, 16 Jan 2017 11:17:58 GMT
Hi,

In the same topology, I generate aggregates with 1-day windows and 1-week
windows and write them in one single topic. On Mondays, these windows have
the same start time. The effect: these aggregates overrides each other.

That happens because WindowedSerializer [1] only serializes the window
start time. I'm a bit surprised, a window has by definition a start and an
end. I suppose one wanted save on key sizes ? And/or one would consider
that topics should not contain aggregates with different granularities ?

I have two choices then, either create as many output topics as I have
granularities, or create my own serializer which also includes the window
end time. What would the community recommend ?

Getting back to the core problem:
I could understand that it's not "right" to store different granularities
in one topic, and I thought it would save resources (less topic to manage
by Kafka). But, I'm really not sure about this default serializer: it does
not serialize all instance variables of the `Window` class, and more
generally does comply to the definition of a window.

[1]
https://github.com/apache/kafka/blob/0.10.1/streams/src/main/java/org/apache/kafka/streams/kstream/internals/WindowedSerializer.java

Thanks.
Nicolas

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message