kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrienne Kole <adrienneko...@gmail.com>
Subject Re: Kafka Streams dynamic partitioning
Date Wed, 05 Oct 2016 21:27:09 GMT
Thanks, I got the point. That solves my problem.



On Wed, Oct 5, 2016 at 10:58 PM, Matthias J. Sax <matthias@confluent.io>
wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> Hi,
>
> even if you have more distinct keys than partitions (ie, different key
> go to the same partition), if you do "aggregate by key" Streams will
> automatically separate the keys and compute an aggregate per key.
> Thus, you do not need to worry about which keys is hashed to what
> partition.
>
> - -Matthias
>
> On 10/5/16 1:37 PM, Adrienne Kole wrote:
> > Hi,
> >
> > @Ali     IMO, Yes. That is the job of kafka server to assign kafka
> > instances partition(s) to process. Each instance can process more
> > than one partition but one partition cannot be processed by more
> > than one instance.
> >
> > @Michael, Thanks for reply.
> >> Rather, pick the number of partitions in a way that matches your
> >> needs to
> > process the data in parallel I think this should be ' pick number
> > of partitions that matches max number of possible keys in stream to
> > be partitioned '. At least in my usecase , in which I am trying to
> > partition stream by key and make windowed aggregations, if there
> > are less number of topic partitions than possible keys,  then
> > application will not work correctly.
> >
> > That is, if the number of topic partitions is less than possible
> > stream keys, then different keyed stream tuples will be assigned to
> > same topic. That was the problem that I was trying to solve and it
> > seems the only solution is to estimate max number of possible keys
> > and assign accordingly.
> >
> > Thanks Adrienne
> >
> >
> >
> >
> >
> > On Wed, Oct 5, 2016 at 5:55 PM, Ali Akhtar <ali.rac200@gmail.com>
> > wrote:
> >
> >>> It's often a good
> >> idea to over-partition your topics.  For example, even if today
> >> 10 machines (and thus 10 partitions) would be sufficient, pick a
> >> higher number of partitions (say, 50) so you have some wiggle
> >> room to add more machines (11...50) later if need be.
> >>
> >> If you create e.g 30 partitions, but only have e.g 5 instances of
> >> your program, all on the same consumer group, all using kafka
> >> streams to consume the topic, do you still receive all the data
> >> posted to the topic, or will you need to have the same instances
> >> of the program as there are partitions?
> >>
> >> (If you have 1 instance, 30 partitions, will the same rules
> >> apply, i.e it will receive all data?)
> >>
> >> On Wed, Oct 5, 2016 at 8:52 PM, Michael Noll
> >> <michael@confluent.io> wrote:
> >>
> >>>> So, in this case I should know the max number of possible
> >>>> keys so that I can create that number of partitions.
> >>>
> >>> Assuming I understand your original question correctly, then
> >>> you would
> >> not
> >>> need to do/know this.  Rather, pick the number of partitions in
> >>> a way
> >> that
> >>> matches your needs to process the data in parallel (e.g. if you
> >>> expect
> >> that
> >>> you require 10 machines in order to process the incoming data,
> >>> then you'd need 10 partitions).  Also, as a general
> >>> recommendation:  It's often a
> >> good
> >>> idea to over-partition your topics.  For example, even if today
> >>> 10
> >> machines
> >>> (and thus 10 partitions) would be sufficient, pick a higher
> >>> number of partitions (say, 50) so you have some wiggle room to
> >>> add more machines (11...50) later if need be.
> >>>
> >>>
> >>>
> >>> On Wed, Oct 5, 2016 at 9:34 AM, Adrienne Kole
> >>> <adriennekole1@gmail.com> wrote:
> >>>
> >>>> Hi Guozhang,
> >>>>
> >>>> So, in this case I should know the max number of possible
> >>>> keys so that
> >> I
> >>>> can create that number of partitions.
> >>>>
> >>>> Thanks
> >>>>
> >>>> Adrienne
> >>>>
> >>>> On Wed, Oct 5, 2016 at 1:00 AM, Guozhang Wang
> >>>> <wangguoz@gmail.com>
> >>> wrote:
> >>>>
> >>>>> By default the partitioner will use murmur hash on the key
> >>>>> and mode
> >> on
> >>>>> current num.partitions to determine which partitions to go
> >>>>> to, so
> >>> records
> >>>>> with the same key will be assigned to the same partition.
> >>>>> Would that
> >> be
> >>>> OK
> >>>>> for your case?
> >>>>>
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>>
> >>>>> On Tue, Oct 4, 2016 at 3:00 PM, Adrienne Kole <
> >> adriennekole1@gmail.com
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> From Streams documentation, I can see that each Streams
> >>>>>> instance is processing data independently (from other
> >>>>>> instances), reads from
> >>> topic
> >>>>>> partition(s) and writes to specified topic.
> >>>>>>
> >>>>>>
> >>>>>> So here, the partitions of topic should be determined
> >>>>>> beforehand
> >> and
> >>>>> should
> >>>>>> remain static. In my usecase I want to create
> >>>>>> partitioned/keyed (time) windows and aggregate them. I
> >>>>>> can partition the incoming data to specified topic's
> >>>>>> partitions
> >> and
> >>>>> each
> >>>>>> Stream instance can do windowed aggregations.
> >>>>>>
> >>>>>> However, if I don't know the number of possible keys (to
> >> partition),
> >>>> then
> >>>>>> what should I do?
> >>>>>>
> >>>>>> Thanks Adrienne
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> -- -- Guozhang
> >>>>>
> >>>>
> >>>
> >>
> >
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJX9Wl2AAoJECnhiMLycopPVNQQAJnLVIEFTWRdUY41jLEjHEdJ
> Nwqk/M/VrZ3/s8BR9+XKKN+lTd+lQaBFgQUxyae18kIchnEe5r+QB+PoDB4IkTV8
> zS6XhDTr7RwiHdhykGK9bKxhF/0gAiQ4qFu8iBlmwTfH3mOSDgY76z4/wQVnS7Sf
> C1/s2ubvQFgEp0W1OOiTAy2uYhPkeskLjHpFL7Nxc19zGy4a8IeHFo2r1CYCsJHJ
> VBOsLaBgstICTcWnx1lJBjqwhqlXPPo4+dOo+e6h71vuHhFMePhsPuxHQ9nBVKw/
> 0S0X4m+fB2FInx9XOG9rHA3nYvK5zr5eijKMNGGdJfU9lItcM5nhnEnPOI1QLnak
> rrAgwbdeUlv0clo04tAyaxGxz2/F0Z5S3xJa1M5vvAd5895jeKdh1l7UdByQWA5R
> BTkYWodEZ01Zn6fqHkhR5tsWzKLfvFr2bXps/21WzpC90bJK4snUXSs97ugVdT0U
> UgngxEeD9566EENIFzF2HGOrkZd74B5sEs4p5Tp16JhzOydnv9xGGOfxDJXwr0lh
> 5TBcKRqF/998zyil7UOFFecvR7DUYDc/pJIJVffRo7DyjvkOCK1OYBBQB50JTh3s
> blMCHsNu7iXDbRocLT2EigkqKZtQ5w4Xm7e3pEkqQJ/KOnmvJsbg4JFPRC3sw+7X
> h+bHtn7Nbc7HCUhho4nJ
> =9zvn
> -----END PGP SIGNATURE-----
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message