kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Aliakbari <raliakb...@gmail.com>
Subject Re: Partition Consumer(s)
Date Fri, 11 Sep 2015 07:15:23 GMT
This is not good solution to monitor and kill the bad consumer,  if my
consumer can't manage my partition well even when there are idle threads
then I have a bad design.

I can't design a system that in some situations doesn't deliver thousands
of emails because one thread couldn't manage things well(even when I have
enough number of partitions)

So I understand that Kafka doesn't provide concurrently in the form that
rabbitmq provides.

I just can't understand why should any message delayed when I have enough
machines and threads idle.

On Thursday, September  2015, Helleren, Erik <Erik.Helleren@cmegroup.com>

> So, the general scalability approach with kafka is to add more partitions
> to scale.  If you are using consumer groups and the High Level Consumer
> API, redistribution of partitions is automatic on a failover of a member
> of a consumer group.   But, the High level consumer doesn¹t allow a
> configuration to break up partitions as is noted here:
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> There isn¹t really any way for multiple separate clients on separate JVM's
> to coordinate their consumption off of a single partition efficiently. So
> the solution is simply to break up a topic into enough partitions so that
> a single partition is a reasonable unit to scale a consumer by.  If a
> consumer can only handle a single partition or worse, is falling behind,
> your partitions are too large and need to be adjusted.
> And if for some reason a process hangs on a partition, kill it and start
> up a new one. Provided partitions are a reasonable unit of scale, it
> shouldn¹t be a problem.  There will be a latency spike, but that¹s better
> than starvation. You can split processing of a single partition pretty
> easily within a JVM.  The kafka consuming runnable can just put messages
> into a concurrent queue of some sort, and then have a large thread pool
> pulling from that queue to do the processing.  That way if a thread in the
> pool gets hung, there are many left to consume off the queue so nothing
> gets hung up.  But this adds some risk on failover based on how kafka does
> offset management for the high level consumer.
> So, I don¹t think that sending backoff messages to a producer to let up on
> a partition is a good design pattern for kafka. Again, the solution is
> more partitions.  But offset data is stored in either kafka or zookeeper
> depending on your configuration, which can tell you how many messages your
> consumer is behind by.  But, since messages being published should be
> evenly distributed across all partitions for a topic, all partitions
> should be lagging equally.
> If you need a true unified queue RabitMQ might be right for your needs.
> But if order doesn¹t matter at all, kafka should give you more throughput
> with enough partitions.  And since order doesn¹t matter, you have a lot of
> flexibility here.
> Also, another option to doing everything in a native java client is to use
> a Spark application.  It makes faning out your data very easy, and has
> some semantics that make it well suited for some of these concerns.
> On 9/10/15, 9:54 AM, "Reza Aliakbari" <raliakbari@gmail.com <javascript:;>>
> wrote:
> >Hi Everybody,
> >
> >I have 2 question regarding the way consumers, consume messages of a
> >partition.
> >
> >
> >   - * Is it possible to configure Kafka to allow concurrent message
> >   consumption from one partition concurrently? The order is not my
> >concern at
> >   all.*
> >
> >           I couldn't find any way to that by the Group Of Consumer
> >approach, If it is possible please let me know, If impossible, then let me
> >know how to address this problem:
> >           For a reason a consumer that is assigned to a partition could
> >get very slow, and the messages would be processed very slowly. How can I
> >detect this and stop producing on this slow partition...
> >
> >
> >
> >
> >   - * Suppose I have 5 partitions and 3 consumers and I am using Group of
> >   Consumers model(I had 5 consumers at start but 2 servers crashed), 3
> >   consumers are working busy with their 3   partitions and they never get
> >   finished since the producer produce to their partitions **non-stop and
> >a
> >   little faster than their consumption. What happens to the other 2
> >   partitions that are missing consumers? How the Group of Consumers can
> >   handle this issue?*
> >
> >
> >*The order is no matter for me, I need a simple configuration that address
> >my concurrency needs and I need to make sure no message gets into
> >starvation scenario that never consumed.*
> >
> >Please let us know, we want to select between Kafka and RabitMQ and we
> >prefer Kafka because it is growing community and high throughput, But
> >first
> >we need to address these basic needs.
> >
> >
> >Thanks,
> >
> >Reza Aliakabri

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message