kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Architecture: amount of partitions
Date Fri, 08 Aug 2014 20:59:45 GMT
Kane,

The in-built offset management is already in master branch, and will be
included in 0.8.2. For now you can give the current trunk a spin.

Guozhang


On Fri, Aug 8, 2014 at 1:42 PM, Kane Kane <kane.isturm@gmail.com> wrote:

> Hello Guozhang,
>
> Is storing offsets in kafka topic already in master branch?
> We would like to use that feature, when do you plan to release 0.8.2?
> Can we use master branch meanwhile (i.e. is it stable enough).
>
> Thanks.
>
> On Fri, Aug 8, 2014 at 1:38 PM, Guozhang Wang <wangguoz@gmail.com> wrote:
> > Hi Roman,
> >
> > Current Kafka messaging guarantee is at-least once, and we are working on
> > transactional messaging features to make it exactly once. We are
> expecting
> > it to be used as synchronization/replication layer for some storage
> systems
> > as your use case after that.
> >
> > As for your design, since you will probably have a lot of users and each
> > user's data is small, you will end up with many small files on Kafka. If
> > all you want is order preserving per user, you can probably just use
> > keyed-messages with key as the user id, by that all messages with the
> same
> > key will end up into the same partition and hence consumed by the same
> > consumer client. With that you only need a fixed small number of
> partitions.
> >
> > Guozhang
> >
> >
> > On Fri, Aug 8, 2014 at 12:35 PM, Roman Iakovlev <roman.iakovlev@live.com
> >
> > wrote:
> >
> >> Dear all,
> >>
> >>
> >>
> >> I'm new to Kafka, and I'm considering using it for a maybe not very
> usual
> >> purpose. I want it to be a backend for data synchronization between a
> >> magnitude of devices, which are not always online (mobile and embedded
> >> devices). All the synchronized information belong to some user, and can
> be
> >> identified by the user id. There are several data types, and a user can
> >> have
> >> many entries of each data type coming from many different devices.
> >>
> >>
> >>
> >> This solution has to scale up to hundreds of thousands of users, and, as
> >> far
> >> as I understand, Kafka stores every partition in a single file. I've
> been
> >> thinking about creating a topic for every data type and a separate
> >> partition
> >> for every user. Amount of data stored by every user is no more than
> several
> >> megabytes over the whole lifetime, because the data stored would be
> keyed
> >> messages, and I'm expecting it to be compacted.
> >>
> >>
> >>
> >> So what I'm wondering is, would Kafka be a right approach for such task,
> >> and
> >> if yes, would this architecture (one topic per data type and one
> partition
> >> per user) scale to specified extent?
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Roman.
> >>
> >>
> >
> >
> > --
> > -- Guozhang
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message