kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: New consumer not fetching as quickly as possible
Date Wed, 02 Dec 2015 18:04:10 GMT
Thanks for the updates Tao.

Just wanted to make sure that there is no other potential issues when
consumer and broker are remote, which is also quite common in practice: if
you increase the timeout value in poll(timeout) to even larger values (say
two times the average latency in your network) and also set the
request.timeout.ms config to be large enough as well, does that resolve the
issue even if your consumer is not co-located?

Guozhang

On Wed, Dec 2, 2015 at 12:46 AM, tao xiao <xiaotao183@gmail.com> wrote:

> It turned out it was due to network latency btw consumer and broker.  Once
> I moved the consumer to the same box of broker messages were returned in
> every poll.
>
> Thanks for all the helps.
>
> On Wed, 2 Dec 2015 at 15:38 Gerard Klijs <gerard.klijs@dizzit.com> wrote:
>
> > Another possible reason witch comes to me mind is that you have multiple
> > consumer threads, but not the partitions/brokers to support them. When
> I'm
> > running my tool on multiple threads I get a lot of time-outs. When I only
> > use one consumer thread I get them only at the start and the end.
> >
> > On Wed, Dec 2, 2015 at 5:43 AM Jason Gustafson <jason@confluent.io>
> wrote:
> >
> > > There is some initial overhead before data can be fetched. For example,
> > the
> > > group has to be joined and topic metadata has to be fetched. Do you see
> > > unexpected empty fetches beyond the first 10 polls?
> > >
> > > Thanks,
> > > Jason
> > >
> > > On Tue, Dec 1, 2015 at 7:43 PM, tao xiao <xiaotao183@gmail.com> wrote:
> > >
> > > > Hi Jason,
> > > >
> > > > You are correct. I initially produced 10000 messages in Kafka before
> I
> > > > started up my consumer with auto.offset.reset=earliest. But like I
> said
> > > the
> > > > majority number of first 10 polls returned 0 message and the lag
> > remained
> > > > above 0 which means I still have enough messages to consume.  BTW I
> > > commit
> > > > offset manually so the lag should accurately reflect how many
> messages
> > > > remaining.
> > > >
> > > > I will turn on debug logging and test again.
> > > >
> > > > On Wed, 2 Dec 2015 at 07:17 Jason Gustafson <jason@confluent.io>
> > wrote:
> > > >
> > > > > Hey Tao, other than high latency between the brokers and the
> > consumer,
> > > > I'm
> > > > > not sure what would cause this. Can you turn on debug logging and
> run
> > > > > again? I'm looking for any connection problems or metadata/fetch
> > > request
> > > > > errors. And I have to ask a dumb question, how do you know that
> more
> > > > > messages are available? Are you monitoring the consumer's lag?
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Tue, Dec 1, 2015 at 10:07 AM, Gerard Klijs <
> > gerard.klijs@dizzit.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Tao, it worked.
> > > > > > I also played around with my test setting trying to replicate
> your
> > > > > results,
> > > > > > using default settings. But als long as the poll timeout is
set
> to
> > > > 100ms
> > > > > or
> > > > > > larger the only time-out I get are near the start and near the
> end
> > > > (when
> > > > > > indeed there is nothing to consume). This is with a producer
> > putting
> > > > out
> > > > > > 1000 messages a second. Maybe the load of the producer your
using
> > is
> > > > not
> > > > > > constant? Maybe you could run a test with the
> > > > > > org.apache.kafka.tools.ProducerPerformance class to see if it
> > makes a
> > > > > > difference?
> > > > > >
> > > > > > On Tue, Dec 1, 2015 at 11:35 AM tao xiao <xiaotao183@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Gerard,
> > > > > > >
> > > > > > > In your case I think you can set fetch.min.bytes=1 so that
the
> > > server
> > > > > > will
> > > > > > > answer the fetch request as soon as a single byte of data
is
> > > > available
> > > > > > > instead of accumulating enough messages.
> > > > > > >
> > > > > > > But in my case is I have plenty of messages in broker and
I am
> > sure
> > > > the
> > > > > > > size of total message are much larger than the default
setting
> > > which
> > > > is
> > > > > > > 1024 bytes but still the consumer doesn't return messages
for
> > every
> > > > > poll.
> > > > > > >
> > > > > > >
> > > > > > > On Tue, 1 Dec 2015 at 18:29 Gerard Klijs <
> > gerard.klijs@dizzit.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > I was experimenting with the timeout setting, but
as long as
> > > > messages
> > > > > > are
> > > > > > > > produced and the consumer(s) keep polling I saw little
> > > difference.
> > > > I
> > > > > > did
> > > > > > > > see for example that when producing only 1 message
a second,
> > > still
> > > > it
> > > > > > > > sometimes wait to get three messages. So I also would
like to
> > > know
> > > > if
> > > > > > > there
> > > > > > > > is a faster way.
> > > > > > > >
> > > > > > > > On Tue, Dec 1, 2015 at 10:35 AM tao xiao <
> xiaotao183@gmail.com
> > >
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi team,
> > > > > > > > >
> > > > > > > > > I am using the new consumer with broker version
0.9.0. I
> > notice
> > > > > that
> > > > > > > > > poll(time) occasionally returns 0 message even
though I
> have
> > > > enough
> > > > > > > > > messages in broker. The rate of returning 0 message
is
> quite
> > > high
> > > > > > like
> > > > > > > 4
> > > > > > > > > out of 5 polls return 0 message. It doesn't help
by
> > increasing
> > > > the
> > > > > > poll
> > > > > > > > > timeout from 300ms to 1 second. are there any
> configurations
> > > > that I
> > > > > > can
> > > > > > > > > tune to fetch  data as quickly as possible?
> > > > > > > > >
> > > > > > > > > Both consumer and broker configs are default
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message