kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hargett, Phil" <phil.harg...@mirror-image.com>
Subject 0.8 behavior change: consumer "re-receives" last batch of messages in a topic?
Date Wed, 13 Mar 2013 18:49:37 GMT
I have 2 consumers in our scenario, reading from different brokers. Each broker is running
standalone, although each have their own dedicated zookeeper instance for bookkeeping.

After switching from 0.7.2, I noticed that both consumers exhibited high CPU usage. I am not
yet exploiting any zookeeper knowledge in my consumer code; I am just making calls to the
SimpleConsumer in the java API, passing the host and port of my broker. 

In 0.7.2, I kept the last offset from messages received via a fetch, and used that as the
offset passed into the fetch method when receiving the next message set.

With 0.8, I had to add a check to drop fetched messages when the message's offset was less
than my own offset, based on the last message I saw. If I didn't make that change, it seemed
like the last 200 or so messages in my topic  (probably matches a magic batch size configured
somewhere in all of this code) were continually refetched.

In this scenario, my topic was no longer accumulating messages, as I had turned off the producer,
so I was expecting the fetches to eventually either block, return an empty message set, or
fail (not sure of semantics of fetch). Continually receiving the last "batch" of messages
at the end of the topic was not a semantic I expected.

Is this an intended change in behavior—or do I need to write better consumer code?

Guidance, please.

View raw message