kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Shepherd <dtsheph...@gmail.com>
Subject Re: max number of partitions with v0.9.0
Date Wed, 04 May 2016 04:00:57 GMT
Thanks Ben, that's what I thought and I believe your suggestion is
essentially what I planned to implement. We have a single topic with raw
messages that will be partitioned randomly on ingest (just for
scalability). I planned to install a consumer group router that reads from
this "raw" topic and routes messages to "normal" or "throttled" topics.
Both of these topics would be partitioned by the ID since I need the
guarantee of a single consumer processing messages for a given ID. Routing
would be very fast, while processing each message is much slower.

Know of any existing rate-based message routers between Kafka topics?

-Dave
On Tue, May 3, 2016 at 11:42 PM Benjamin Manns <benmanns@gmail.com> wrote:

> From my knowledge (beginner's) each partition still requires at least a
> file selector on the Kafka brokers. The new consumer structure means
> consumers won't store data in Zookeeper, but topics and partitions still
> do.
>
> What I would do is key by your ID and place a rate limiting stream
> processor in front of your heavier processors. This could be a windowed
> task that counts how many messages have been sent in the last few seconds
> or minutes. For under-limit IDs send to a high priority topic. For over
> limit, a lower priority topic.
>
>
> Ben
>
> On Tuesday, May 3, 2016, David Shepherd <dtshepherd@gmail.com> wrote:
>
> > I was wonder if the new Kafka Consumer introduced in 0.9.0 (
> >
> >
> http://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0.9-consumer-client
> > )
> > allows for a higher number of partitions in a given cluster since it
> > removes the zookeeper dependency. I understand the file descriptor and
> > availability concerns discussed here:
> >
> >
> http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
> > .
> >
> >
> > The reason I ask is because we'd like to use partitioning to limit the
> > impact of a message flood on our downstream consumers. If we can
> partition
> > by a particular ID, it will isolate message floods from a given source
> into
> > a single partition, which allows us to allocate a single consume to
> process
> > that flood without affecting quality of service to the rest of the
> system.
> > Unfortunately, partitioning this way could create millions of partitions,
> > each only producing a few messages per minute with the exception that a
> few
> > of the partitions will be sending thousands of messages per minute.
> >
> > I'm also open to suggestions on how others have solved the flooding /
> noisy
> > neighbor problem in Kafka.
> >
> > Thanks,
> > Dave Shepherd
> >
>
>
> --
> Benjamin Manns
> benmanns@gmail.com
> (434) 321-8324
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message