samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarrad, Ken " <>
Subject checkpoint on flush of system producer
Date Tue, 06 Sep 2016 13:55:28 GMT
I am writing a SystemProducer and I would be grateful for any comments or documentation.

I want my SystemProducer to collect messages until flush is invoked.
That is, I don't want to transmit each message 'one by one'.
I want to wait until flush is invoked and then transmit 'in bulk'.

I don't know how to do this without violating the 'at least once' guarantee.

I am concerned that any message that is 'sent' from my StreamTask will be included in a checkpoint.
The message might not have been transmitted, however.
The message may reside in my SystemProducer pending flush.

If the StreamTask is re-started from the checkpoint then the message will not be replayed.
[The task input stream is a Kafka topic]

How do I co-ordinate the checkpoint mechanism with SystemProducer so that the checkpoint is
delayed until the message is flushed?

View raw message