kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ward <tim.w...@origamienergy.com.INVALID>
Subject RE: How do I tell Kafka Streams not to repartition?
Date Tue, 13 Aug 2019 07:45:16 GMT
Thanks.

Tim Ward

-----Original Message-----
From: Matthias J. Sax <matthias@confluent.io>
Sent: 13 August 2019 08:23
To: users@kafka.apache.org
Subject: Re: How do I tell Kafka Streams not to repartition?

Atm, it's not possible to tell Kafka Streams that repartitioning is not
necessary after a key-changing operation at DSL level.

I personally think it would be a good improvement to add this
functionality. It's not the first time somebody asked for it. Feel free
to create a JIRA (and maybe even contribute :) -- note, that we would
need a KIP for this).


The only alternative you have currently, is to not use
`groupByKey().aggregate()`, but `transformValues()` (or similar) and
implement the aggregation manually.


-Matthias


On 8/12/19 1:25 AM, Tim Ward wrote:
> I'm using groupByKey, and it causes repartitioning.
>
> I suppose I could aggregate by parent ID, if the data structure into which I aggregate
by parent ID is itself a map from child ID to what I'm really wanting to aggregate - is that
what you had in mind? - I think it would work!
>
> Give or take a problem I've discovered with persistence following a crash in the middle
of aggregation, which I'll post separately.
>
> Tim Ward
>
> -----Original Message-----
> From: Boyang Chen <reluctanthero104@gmail.com>
> Sent: 09 August 2019 23:31
> To: users@kafka.apache.org
> Subject: Re: How do I tell Kafka Streams not to repartition?
>
> In case I'm not making myself clear, any operation that changes the record
> key will result in repartition. Since you don't want that, you shall choose
> to call groupByKey afterwards and aggregation will happen on `parent id`
> level.
>
> On Fri, Aug 9, 2019 at 3:27 PM Boyang Chen <reluctanthero104@gmail.com>
> wrote:
>
>> Hey Tim,
>>
>> I think the functionality you need is groupByKey() which avoids
>> repartitioning, feel free to check it out here:
>> https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#aggregating.
>> Recommend you to read the whole thing but feel free just to search
>> `groupByKey`.
>>
>> On Fri, Aug 9, 2019 at 7:14 AM Tim Ward <tim.ward@origamienergy.com>
>> wrote:
>>
>>> I've got an input topic which is keyed by "parent ID". Each message
>>> contains multiple items of data, each for a different "child ID".
>>>
>>> To process these items separately I flatMapValues() the stream to make a
>>> new stream of the inner items of data, keyed by "child ID".
>>>
>>> Now, because I've changed the key, Kafka Streams thinks a repartition is
>>> needed. But in fact it isn't, because all the inner items for a particular
>>> "child ID" will be contained within messages keyed with the same "parent
>>> ID".
>>>
>>> How do I tell Kafka Streams that there is no need to repartition in this
>>> case, because all the data that should remain together in the same instance
>>> of the application will do so without repartitioning? (I appreciate that
>>> Streams can't know about the parent-child relationship unless I *do* tell
>>> it in some way.)
>>>
>>> Tim Ward
>>>
>>> This email is from Origami Energy Limited. The contents of this email and
>>> any attachment are confidential to the intended recipient(s). If you are
>>> not an intended recipient: (i) do not use, disclose, distribute, copy or
>>> publish this email or its contents; (ii) please contact Origami Energy
>>> Limited immediately; and then (iii) delete this email. For more
>>> information, our privacy policy is available here:
>>> https://origamienergy.com/privacy-policy/. Origami Energy Limited
>>> (company number 8619644) is a company registered in England with its
>>> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
>>>
>>
> This email is from Origami Energy Limited. The contents of this email and any attachment
are confidential to the intended recipient(s). If you are not an intended recipient: (i) do
not use, disclose, distribute, copy or publish this email or its contents; (ii) please contact
Origami Energy Limited immediately; and then (iii) delete this email. For more information,
our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy
Limited (company number 8619644) is a company registered in England with its registered office
at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
>

This email is from Origami Energy Limited. The contents of this email and any attachment are
confidential to the intended recipient(s). If you are not an intended recipient: (i) do not
use, disclose, distribute, copy or publish this email or its contents; (ii) please contact
Origami Energy Limited immediately; and then (iii) delete this email. For more information,
our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy
Limited (company number 8619644) is a company registered in England with its registered office
at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
Mime
View raw message