kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Black...@b3k.us>
Subject Re: Managing Millions of Paritions in Kafka
Date Mon, 07 Oct 2013 00:37:25 GMT
What you are discovering is that Kafka is a message broker, not a database.


On Sun, Oct 6, 2013 at 5:34 PM, Ravindranath Akila <
ravindranathakila@gmail.com> wrote:

> Thanks a lot Neha!
>
> Actually, using keyed messages(with Simple Consumers) was the approach we
> took. But it seems we can't map each user to a new partition due to
> Zookeeper limitations. Rather, we will have to map a "group" of users on
> one partition. Then we can't fetch the messages for only one user.
>
> It seems our data is best put on HBase with a TTL and versioning.
>
> Thanks!
>
> R. A.
> On 6 Oct 2013 16:00, "Neha Narkhede" <neha.narkhede@gmail.com> wrote:
>
> > Kafka is designed to have of the order of few thousands of partitions
> > roughly less than 10,000. And the main bottleneck is zookeeper. A better
> > way to design such a system is to have fewer partitions and use keyed
> > messages to distribute the data over a fixed set of partitions.
> >
> > Thanks,
> > Neha
> > On Oct 5, 2013 8:19 PM, "Ravindranath Akila" <
> ravindranathakila@gmail.com>
> > wrote:
> >
> > > Initially, I thought dynamic topic creation can be used to maintain per
> > > user data on Kafka. The I read that partitions can and should be used
> for
> > > this instead.
> > >
> > > If a partition is to be used to map a user, can there be a million, or
> > even
> > > billion partitions in a cluster? How does one go about designing such a
> > > model.
> > >
> > > Can the replication tool be used to assign, say partitions 1 - 10,000
> on
> > > replica 1, and 10,001 - 20,000 on replica 2?
> > >
> > > If not, since there is a ulimit on the file system, should one model it
> > > based on a replica/topic/partition approach. Say users 1-10,000 go on
> > topic
> > > 10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on topic
> > > 10k-2, and has 10,000 partitions.
> > >
> > > Simply put, how can a million stateful data points be handled? (I
> deduced
> > > that a userid-partition number mapping can be done via a partitioner,
> but
> > > unless a server can be configured to handle only a given set of
> > partitions,
> > > with a range based notation, it is almost impossible to handle a large
> > > dataset. Is it that Kafka can only handle a limited set of stateful
> data
> > > right now?)
> > >
> > >
> > >
> >
> http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions
> > >
> > > Btw, why does Kafka have to keep open each partition? Can't a partition
> > be
> > > opened for read/write when needed only?
> > >
> > > Thanks in advance!
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message