kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomoyuki Saito <aocch...@gmail.com>
Subject Batch processing with Kafka Streams with at-least-once semantics
Date Thu, 11 Oct 2018 06:15:44 GMT

I'm exploring whether it is possible to use Kafka Streams for batch
processing with at-least-once semantics.

What I want to do is to insert records in an external storage in bulk, and
execute offset-commit after the bulk insertion to achieve at-least-once
A processing topology can be very simple like:
TopologyBuilder builder = new TopologyBuilder();
builder.addSource("source", "topic");
builder.addProcessor("processor", processorSupplier, "source");
new KafkaStreams(builder, streamsConfig);

My questions are:

1. Could you suggest how to achieve that? Can it be better to use
KafkaConsumer instead of KafkaStreams?

2. From my understanding, when setting StreamsConfig `commit.interval.ms`
to 0, we can turn off offset-commit by KafkaStreams internal logic (in
StreamThread#maybeCommit), and control when to commit offsets with
`ProcessorContext#commit`. Is my understanding right? Any expected issues
for this approach?

Thank you,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message