I am writing a SystemProducer and I would be grateful for any comments or documentation.
I want my SystemProducer to collect messages until flush is invoked.
That is, I don't want to transmit each message 'one by one'.
I want to wait until flush is invoked and then transmit 'in bulk'.
I don't know how to do this without violating the 'at least once' guarantee.
I am concerned that any message that is 'sent' from my StreamTask will be included in a checkpoint.
The message might not have been transmitted, however.
The message may reside in my SystemProducer pending flush.
If the StreamTask is re-started from the checkpoint then the message will not be replayed.
[The task input stream is a Kafka topic]
How do I co-ordinate the checkpoint mechanism with SystemProducer so that the checkpoint is
delayed until the message is flushed?
|