kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Friedman <...@flurry.com>
Subject Re: Consumer throughput imbalance
Date Sun, 25 Aug 2013 17:22:11 GMT
Sorry I reread what I've written so far and found that it doesn't state the actual problem
very well. Let me clarify once again: 

The problem we're trying to solve is that we can't let messages go for unbounded amounts of
time without getting processed, and it seems that something about what we're doing (which
I suspect is the fact that consumers own several partitions but only consume from one of them
at a time until it's caught up) is causing a small number of them to sit around for hours
and hours. This is despite some consumers idling due to being fully caught up on the partitions
they own. We've found that requeueing the oldest messages (consumers ignore messages that
have already been processed) is fairly effective in getting them to go away, but I'm looking
for a more stable solution. 

-- 
Ian Friedman


On Sunday, August 25, 2013 at 1:15 PM, Ian Friedman wrote:

> When I said "some messages take longer than others" that may have been misleading. What
I meant there is that the performance of the entire application is inconsistent, mostly due
to pressure from other applications (mapreduce) on our HBase and MySQL backends. On top of
that, some messages just contain more data. Now I suppose what you're suggesting is that I
segment my messages by the average or expected time it takes the payloads to process, but
I suspect what will happen if I do that is I will have several consumers doing nothing most
of the time, and the rest of them backlogged inconsistently the same way they are now. The
problem isn't so much the size of the payloads but the fact that we're seeing some messages,
which i suspect are in partitions with lots of longer running processing tasks, sit around
for hours without getting consumed. That's what I'm trying to solve.  
> 
> Is there any way to "add more consumers" without actually adding more consumer JVM processes?
We've hit something of a saturation point for our MySQL database. Is this maybe where having
multiple consumer threads would help? If so, given that I have a singular shared processing
queue in each consumer, how would I leverage that to solve this problem? 
> 
> -- 
> Ian Friedman
> 
> 
> On Sunday, August 25, 2013 at 12:13 PM, Mark wrote:
> 
> > I don't think it would matter as long as you separate the types of message in different
topics. Then just add more consumers to the ones that are slow. Am I missing something?
> > 
> > On Aug 25, 2013, at 8:59 AM, Ian Friedman <ian@flurry.com (mailto:ian@flurry.com)>
wrote:
> > 
> > > What if you don't know ahead of time how long a message will take to consume?

> > > 
> > > -- 
> > > Ian Friedman
> > > 
> > > 
> > > On Sunday, August 25, 2013 at 10:45 AM, Neha Narkhede wrote:
> > > 
> > > > Making producer side partitioning depend on consumer behavior might not
be
> > > > such a good idea. If consumption is a bottleneck, changing producer side
> > > > partitioning may not help. To relieve consumption bottleneck, you may
need
> > > > to increase the number of partitions for those topics and increase the
> > > > number of consumer instances.
> > > > 
> > > > You mentioned that the consumers take longer to process certain kinds
of
> > > > messages. What you can do is place the messages that require slower
> > > > processing in separate topics, so that you can scale the number of
> > > > partitions and number of consumer instances, for those messages
> > > > independently.
> > > > 
> > > > Thanks,
> > > > Neha
> > > > 
> > > > 
> > > > On Sat, Aug 24, 2013 at 9:57 AM, Ian Friedman <ian@flurry.com (mailto:ian@flurry.com)
(mailto:ian@flurry.com)> wrote:
> > > > 
> > > > > Hey guys! We recently deployed our kafka data pipeline application
over
> > > > > the weekend and it is working out quite well once we ironed out all
the
> > > > > issues. There is one behavior that we've noticed that is mildly troubling,
> > > > > though not a deal breaker. We're using a single topic with many partitions
> > > > > (1200 total) to load balance our 300 consumers, but what seems to
happen is
> > > > > that some partitions end up more backed up than others. This is probably
> > > > > due more to the specifics of the application since some messages
take much
> > > > > longer than others to process.
> > > > > 
> > > > > I'm thinking that the random partitioning in the producer is unsuited
to
> > > > > our specific needs. One option I was considering was to write an
alternate
> > > > > partitioner that looks at the consumer offsets from zookeeper (as
in the
> > > > > ConsumerOffsetChecker) and probabilistically weights the partitions
by
> > > > > their lag. Does this sound like a good idea to anyone else? Is there
a
> > > > > better or preferably already built solution? If anyone has any ideas
or
> > > > > feedback I'd sincerely appreciate it.
> > > > > 
> > > > > Thanks so much in advance.
> > > > > 
> > > > > P.S. thanks especially to everyone who's answered my dumb questions
on
> > > > > this mailing list over the past few months, we couldn't have done
it
> > > > > without you!
> > > > > 
> > > > > --
> > > > > Ian Friedman
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message