kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ward <tim.w...@origamienergy.com.INVALID>
Subject KSTREAM-AGGREGATE-STATE-STORE persistence?
Date Mon, 12 Aug 2019 12:38:02 GMT
I believe I have witnessed - at least twice - something like the following happening, in a
Kafka Streams application where I have a .groupByKey().windowedBy().aggregate() sequence.


  *   Application runs for a while
  *   Application crashes
  *   Application restarts
  *   Aggregator.apply() is called to aggregate an input message *that has already been included
in the aggregate*

It looks like when the application crashes, the KSTREAM-AGGREGATE-STATE-STORE has been persisted
after message X has been aggregated but message X has not been committed back to the original
source topic.

So on restart message X gets read and processed again, and gets aggregated a second time into
the same aggregate.

Now that I know this is happening (it was spotted by what I thought was some over-the-top
paranoid validation code) I can cope with it, and it is possible to make the aggregation operation
idempotent, because of the structure of the particular operation I'm doing ... but what if
the aggregation had been something like a simple counting or totalling operation? How would
anyone know the original input message(s) had been aggregated more than once?

So, my question:

Am I correct in diagnosing that persisting the state store and committing the original source
message are not carried out atomically, and one has to expect the same message can be applied
to the same aggregate multiple times, and if one cares about this one has to detect it happening
and make the aggregation process idempotent? I don't see this explained in the JavaDoc for
either Aggregator (or Reducer, where presumably it also applies).

Tim Ward

This email is from Origami Energy Limited. The contents of this email and any attachment are
confidential to the intended recipient(s). If you are not an intended recipient: (i) do not
use, disclose, distribute, copy or publish this email or its contents; (ii) please contact
Origami Energy Limited immediately; and then (iii) delete this email. For more information,
our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy
Limited (company number 8619644) is a company registered in England with its registered office
at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message