kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Noll <mich...@confluent.io>
Subject Re: Kafka Streams dynamic partitioning
Date Wed, 05 Oct 2016 15:52:05 GMT
> So, in this case I should know the max number of possible keys so that
> I can create that number of partitions.

Assuming I understand your original question correctly, then you would not
need to do/know this.  Rather, pick the number of partitions in a way that
matches your needs to process the data in parallel (e.g. if you expect that
you require 10 machines in order to process the incoming data, then you'd
need 10 partitions).  Also, as a general recommendation:  It's often a good
idea to over-partition your topics.  For example, even if today 10 machines
(and thus 10 partitions) would be sufficient, pick a higher number of
partitions (say, 50) so you have some wiggle room to add more machines
(11...50) later if need be.



On Wed, Oct 5, 2016 at 9:34 AM, Adrienne Kole <adriennekole1@gmail.com>
wrote:

> Hi Guozhang,
>
> So, in this case I should know the max number of possible keys so that I
> can create that number of partitions.
>
> Thanks
>
> Adrienne
>
> On Wed, Oct 5, 2016 at 1:00 AM, Guozhang Wang <wangguoz@gmail.com> wrote:
>
> > By default the partitioner will use murmur hash on the key and mode on
> > current num.partitions to determine which partitions to go to, so records
> > with the same key will be assigned to the same partition. Would that be
> OK
> > for your case?
> >
> >
> > Guozhang
> >
> >
> > On Tue, Oct 4, 2016 at 3:00 PM, Adrienne Kole <adriennekole1@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > From Streams documentation, I can see that each Streams instance is
> > > processing data independently (from other instances), reads from topic
> > > partition(s) and writes to specified topic.
> > >
> > >
> > > So here, the partitions of topic should be determined beforehand and
> > should
> > > remain static.
> > > In my usecase I want to create partitioned/keyed (time) windows and
> > > aggregate them.
> > > I can partition the incoming data to specified topic's partitions and
> > each
> > > Stream instance can do windowed aggregations.
> > >
> > > However, if I don't know the number of possible keys (to partition),
> then
> > > what should I do?
> > >
> > > Thanks
> > > Adrienne
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message