kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Weeks <jonathanbwe...@gmail.com>
Subject Re: Architecture: amount of partitions
Date Fri, 08 Aug 2014 20:13:37 GMT

The approach may well depend on your deploy horizon. Currently the offset tracking of each
partition is done in Zookeeper, which places an upper limit on the number of topic/partitions
you want to have and operate with any kind of efficiency.

In 0.8.2 hopefully coming in the next month or two, consumer offset tracking is done via Kafka
topics / internally rather than in ZK, so the above partition count scalability issue isn’t
as severe.

From the Broker side, some filesystems such as XFS have no problem with hundreds of thousands
of files in a directory. My experience with EXT3,4 with lots of files is less happy.

Also, I’m not sure about your retention policy needs for messages in the broker (usually
7 days by default). Using Kafka as a long term DB probably isn’t a great fit.

Another approach to consider is to store users into fewer topics, and differentiate based
on a message key which contains the user-id, for example.

Best Regards,

-JW

On Aug 8, 2014, at 12:35 PM, Roman Iakovlev <roman.iakovlev@live.com> wrote:

> Dear all,
> 
> 
> 
> I'm new to Kafka, and I'm considering using it for a maybe not very usual
> purpose. I want it to be a backend for data synchronization between a
> magnitude of devices, which are not always online (mobile and embedded
> devices). All the synchronized information belong to some user, and can be
> identified by the user id. There are several data types, and a user can have
> many entries of each data type coming from many different devices.
> 
> 
> 
> This solution has to scale up to hundreds of thousands of users, and, as far
> as I understand, Kafka stores every partition in a single file. I've been
> thinking about creating a topic for every data type and a separate partition
> for every user. Amount of data stored by every user is no more than several
> megabytes over the whole lifetime, because the data stored would be keyed
> messages, and I'm expecting it to be compacted.
> 
> 
> 
> So what I'm wondering is, would Kafka be a right approach for such task, and
> if yes, would this architecture (one topic per data type and one partition
> per user) scale to specified extent?
> 
> 
> 
> Thanks, 
> 
> Roman.
> 


Mime
View raw message