kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafi Shamim <r...@knewton.com>
Subject Offset management in multi-threaded high-level consumer
Date Tue, 06 Jan 2015 18:45:14 GMT

I would like to write a multi-threaded consumer for the high-level
consumer in Kafka 0.8.1. I have found two ways that seem feasible
while keeping the guarantee that messages in a partition are processed
in order. I would appreciate any feedback this list has.

Option 1
- Create multiple threads, so each thread has its own ConsumerConnector.
- Manually commit offsets in each thread after every N messages.
- This was discussed a bit on this list previously. See [1].

### Questions
- Is there a problem with making multiple ConsumerConnectors per machine?
- What does it take for ZooKeeper to handle this much load? We have a
3-node ZooKeeper cluster with relatively small machines. (I expect the
topic will have about 40 messages per second. There will be 3 consumer
groups. That would be 120 commits per second at most, but I can reduce
the frequency of commits to make this lower.)

### Extra info
Kafka 0.9 will have an entirely different commit API, which will allow
one connection to commit offsets per partition, but I can’t wait that
long. See [2].

Option 2
- Create one ConsumerConnector, but ask for multiple streams in that
connection. Give each thread one stream.
- Since there is no way to commit offsets per stream right now, we
need to do autoCommit.
- This sacrifices the at-least-once processing guarantee, which would
be nice to have. See KAFKA-1612 [3].

### Extra info
- There was some discussion in KAFKA-996 about a markForCommit()
method so that autoCommit would preserve the at-least-once guarantee,
but it seems more likely that the consumer API will just be redesigned
to allow commits per partition instead. See [4].

So basically I'm wondering if option 1 is feasible. If not, I'll just
do option 2. Of course, let me know if I was mistaken about anything
or if there is another design which is better.

Thanks in advance.

[1] http://mail-archives.apache.org/mod_mbox/kafka-users/201310.mbox/%3CFF142F6B499AE34CAED4D263F6CA32901D35AE89@EXTXMB19.nam.nsroot.net%3E
[2] https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ConsumerAPI
[3] https://issues.apache.org/jira/browse/KAFKA-1612
[4] https://issues.apache.org/jira/browse/KAFKA-966

View raw message