samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Greenberg <>
Subject Questions about checkpointing and Kinesis
Date Mon, 06 Feb 2017 22:57:44 GMT
Starting on an integration project between a Kinesis stream and Samza, despite have no background
in either, but I am familiar with most other messaging/queuing systems.

Decided to keep all state management within Samza instead of using Kinesis' client library.
My plan was to use the default KafkaCheckpointManagerFactory on an timed interval basis, but
I have a few questions.

What exactly is being checkpointed? What value can I retrieve to use as an offset for my Kinesis
stream? Or is this something I need to keep track of in a store? If so, what is the point
of checkpointing? Can I use RocksDb to save the Kinesis offset at every document (efficiently
that is)?

Related to Kinesis and not quite Samza, it does not have a listener/push framework, but it
requires constant polling (unless I missed something). First of all, I was going to have a
partition for each Kinesis shard partition. But the next question is, should I simply have
a while(true) polling method inside my consumer(BlockingEnvelopeMap)? Seems inefficient, even
with a timeout. How can I get new data to instantiate a new consumer? My consumer will put
a new document to my task.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message