kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Cheng <wushuja...@gmail.com>
Subject Re: Is this a bug or just unintuitive behavior?
Date Thu, 05 Jan 2017 08:40:38 GMT

Your analysis is correct. I would say that it is known but unintuitive behavior.

As an example of a problem caused by this behavior, it's possible for mirrormaker to miss
messages on newly created topics, even thought it was subscribed to them before topics were

See the following JIRAs:
https://issues.apache.org/jira/browse/KAFKA-3848 <https://issues.apache.org/jira/browse/KAFKA-3848>
https://issues.apache.org/jira/browse/KAFKA-3370 <https://issues.apache.org/jira/browse/KAFKA-3370>


> On Jan 4, 2017, at 4:37 PM, hans@confluent.io wrote:
> This sounds exactly as I would expect things to behave. If you consume from the beginning
I would think you would get all the messages but not if you consume from the latest offset.
You can separately tune the metadata refresh interval if you want to miss fewer messages but
that still won't get you all messages from the beginning if you don't explicitly consume from
the beginning.
> Sent from my iPhone
>> On Jan 4, 2017, at 6:53 PM, Jeff Widman <jeff@netskope.com> wrote:
>> I'm seeing consumers miss messages when they subscribe before the topic is
>> actually created.
>> Scenario:
>> 1) kafka cluster with allow-topic no topics, but supports topic
>> auto-creation as soon as a message is published to the topic
>> 2) consumer subscribes using topic string or a regex pattern. Currently no
>> topics match. Consumer offset is "latest"
>> 3) producer publishes to a topic that matches the string or regex pattern.
>> 4) broker immediately creates a topic, writes the message, and also
>> notifies the consumer group that a rebalance needs to happen to assign the
>> topic_partition to one of the consumers..
>> 5) rebalance is fairly quick, maybe a second or so
>> 6) a consumer is assigned to the newly-created topic_partition
>> At this point, we've got a consumer steadily polling the recently created
>> topic_partition. However, the consumer.poll() never returns any messages
>> published between topic creation and when the consumer was assigned to the
>> topic_partition. I'm guessing this may be because when the consumer is
>> assigned to the topic_partition it doesn't find any, so it uses the latest
>> offset, which happens to be after the messages that were published to
>> create the topic.
>> This is surprising because the consumer technically was subscribed to the
>> topic before the messages were produced, so you'd think the consumer would
>> receive these messages.
>> Is this known behavior? A bug in Kafka broker? Or a bug in my client
>> library?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message