kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ewen Cheslack-Postava <e...@confluent.io>
Subject Re: Is this a bug or just unintuitive behavior?
Date Fri, 06 Jan 2017 03:55:57 GMT
The basic issue here is just that the auto.offset.reset defaults to latest,
right? That's not a very good setting for a mirroring tool and this seems
like something we might just want to change the default for. It's debatable
whether it would even need a KIP.

We have other settings in MM where we override them if they aren't set
explicitly but we don't want the normal defaults. Most are producer
properties to avoid duplicates (the acks, retries, max.block.ms, and
max.in.flight.requests.per.connection settings), but there are a couple of
consumer ones too (auto.commit.enable and consumer.timeout.ms).

This is probably something like a 1-line MM patch if someone wants to
tackle it -- the question of whether it needs a KIP or not is,
unfortunately, the more complicated question :(

-Ewen

On Thu, Jan 5, 2017 at 1:10 PM, James Cheng <wushujames@gmail.com> wrote:

>
> > On Jan 5, 2017, at 12:57 PM, Jeff Widman <jeff@netskope.com> wrote:
> >
> > Thanks James and Hans.
> >
> > Will this also happen when we expand the number of partitions in a topic?
> >
> > That also will trigger a rebalance, the consumer won't subscribe to the
> > partition until the rebalance finishes, etc.
> >
> > So it'd seem that any messages published to the new partition in between
> > the partition creation and the rebalance finishing won't be consumed by
> any
> > consumers that have offset=latest
> >
>
> It hadn't occured to me until you mentioned it, but yes, I think it'd also
> happen in those cases.
>
> In the kafka consumer javadocs, they provide a list of things that would
> cause a rebalance:
> http://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/
> KafkaConsumer.html#subscribe(java.util.Collection,%20org.
> apache.kafka.clients.consumer.ConsumerRebalanceListener) <
> http://kafka.apache.org/0101/javadoc/org/apache/kafka/clients/consumer/
> KafkaConsumer.html#subscribe(java.util.Collection,
> org.apache.kafka.clients.consumer.ConsumerRebalanceListener)>
>
> "As part of group management, the consumer will keep track of the list of
> consumers that belong to a particular group and will trigger a rebalance
> operation if one of the following events trigger -
>
> Number of partitions change for any of the subscribed list of topics
> Topic is created or deleted
> An existing member of the consumer group dies
> A new member is added to an existing consumer group via the join API
> "
>
> I'm guessing that this would affect any of those scenarios.
>
> -James
>
>
> >
> >
> >
> > On Thu, Jan 5, 2017 at 12:40 AM, James Cheng <wushujames@gmail.com>
> wrote:
> >
> >> Jeff,
> >>
> >> Your analysis is correct. I would say that it is known but unintuitive
> >> behavior.
> >>
> >> As an example of a problem caused by this behavior, it's possible for
> >> mirrormaker to miss messages on newly created topics, even thought it
> was
> >> subscribed to them before topics were creted.
> >>
> >> See the following JIRAs:
> >> https://issues.apache.org/jira/browse/KAFKA-3848 <
> >> https://issues.apache.org/jira/browse/KAFKA-3848>
> >> https://issues.apache.org/jira/browse/KAFKA-3370 <
> >> https://issues.apache.org/jira/browse/KAFKA-3370>
> >>
> >> -James
> >>
> >>> On Jan 4, 2017, at 4:37 PM, hans@confluent.io wrote:
> >>>
> >>> This sounds exactly as I would expect things to behave. If you consume
> >> from the beginning I would think you would get all the messages but not
> if
> >> you consume from the latest offset. You can separately tune the metadata
> >> refresh interval if you want to miss fewer messages but that still won't
> >> get you all messages from the beginning if you don't explicitly consume
> >> from the beginning.
> >>>
> >>> Sent from my iPhone
> >>>
> >>>> On Jan 4, 2017, at 6:53 PM, Jeff Widman <jeff@netskope.com> wrote:
> >>>>
> >>>> I'm seeing consumers miss messages when they subscribe before the
> topic
> >> is
> >>>> actually created.
> >>>>
> >>>> Scenario:
> >>>> 1) kafka 0.10.1.1 cluster with allow-topic no topics, but supports
> topic
> >>>> auto-creation as soon as a message is published to the topic
> >>>> 2) consumer subscribes using topic string or a regex pattern.
> Currently
> >> no
> >>>> topics match. Consumer offset is "latest"
> >>>> 3) producer publishes to a topic that matches the string or regex
> >> pattern.
> >>>> 4) broker immediately creates a topic, writes the message, and also
> >>>> notifies the consumer group that a rebalance needs to happen to assign
> >> the
> >>>> topic_partition to one of the consumers..
> >>>> 5) rebalance is fairly quick, maybe a second or so
> >>>> 6) a consumer is assigned to the newly-created topic_partition
> >>>>
> >>>> At this point, we've got a consumer steadily polling the recently
> >> created
> >>>> topic_partition. However, the consumer.poll() never returns any
> messages
> >>>> published between topic creation and when the consumer was assigned
to
> >> the
> >>>> topic_partition. I'm guessing this may be because when the consumer
is
> >>>> assigned to the topic_partition it doesn't find any, so it uses the
> >> latest
> >>>> offset, which happens to be after the messages that were published to
> >>>> create the topic.
> >>>>
> >>>> This is surprising because the consumer technically was subscribed to
> >> the
> >>>> topic before the messages were produced, so you'd think the consumer
> >> would
> >>>> receive these messages.
> >>>>
> >>>> Is this known behavior? A bug in Kafka broker? Or a bug in my client
> >>>> library?
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message