kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhinav Solan <abhinav.so...@gmail.com>
Subject Re: Kafka Consumer consuming large number of messages
Date Wed, 04 May 2016 21:30:19 GMT
Thanks a lot Jens for the reply.
One thing is still unclear is this happening only when we set the
max.partitions.fetch.bytes to a higher value ? Because I am setting it
quite lower at 8192 only instead, because I can control the size of the
data coming in Kafka, so even after setting this value why the Consumer is
fetching more records, is the Consumer not honoring this property, or is
there some other logic which is making it to fetch more data.

Thanks,
Abhinav

On Wed, May 4, 2016 at 1:40 PM Jens Rantil <jens.rantil@tink.se> wrote:

> Hi,
>
> This is a known issue. The 0.10 release will fix this. See
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-41%3A+KafkaConsumer+Max+Records
> for some background.
>
> Cheers,
> Jens
>
> Den ons 4 maj 2016 19:32Abhinav Solan <abhinav.solan@gmail.com> skrev:
>
> > Hi,
> >
> > I am using kafka-0.9.0.1 and have configured the Kafka consumer  to fetch
> > 8192 bytes by setting max.partition.fetch.bytes
> >
> > Here are the properties I am using
> >
> > props.put("bootstrap.servers", servers);
> > props.put("group.id", "perf-test");
> > props.put("offset.storage", "kafka");
> > props.put("enable.auto.commit", "false");
> > props.put("session.timeout.ms", 60000);
> > props.put("request.timeout.ms", 70000);
> > props.put("heartbeat.interval.ms", 50000);
> > props.put("auto.offset.reset", "latest");
> > props.put("max.partition.fetch.bytes", "8192");
> > props.put("key.deserializer",
> > "org.apache.kafka.common.serialization.StringDeserializer");
> > props.put("value.deserializer",
> > "org.apache.kafka.common.serialization.StringDeserializer");
> >
> > I am setting up 12 Consumers with 4 workers each to listen on a topic
> with
> > 200 partitions.
> > I have also enabled the compression when sending to Kafka.
> >
> > The problem I am getting is, even though the fetch size is less, the
> > consumers when polling, poll too many records. If the topics have many
> > messages and it is behind in the consumption it tries to fetch bigger
> size,
> > if the consumer is not behind then it try and fetch around 45, but
> anyways
> > if I set the max.partition.fetch.bytes shouldn't the fetch size have an
> > upper limit ? Is there any other setting I am missing here ?
> > I am myself controlling the message size so it's not that some bigger
> > messages are coming through, each message must be around 200-300 bytes
> > only.
> >
> > Due the large number of messages it is polling, the inner process
> sometimes
> > not able to finish the process within the heartbeat interval limit, which
> > makes the consumer rebalancing kick in, again and again, this only
> happens
> > when the consumer is way behind in offset e.g there are 100000 messages
> to
> > be processed in the topic.
> >
> > Thanks
> >
> --
>
> Jens Rantil
> Backend Developer @ Tink
>
> Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
> For urgent matters you can reach me at +46-708-84 18 32.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message