kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ward <tim.w...@origamienergy.com.INVALID>
Subject RE: How do I tell Kafka Streams not to repartition?
Date Mon, 12 Aug 2019 08:25:13 GMT
I'm using groupByKey, and it causes repartitioning.

I suppose I could aggregate by parent ID, if the data structure into which I aggregate by
parent ID is itself a map from child ID to what I'm really wanting to aggregate - is that
what you had in mind? - I think it would work!

Give or take a problem I've discovered with persistence following a crash in the middle of
aggregation, which I'll post separately.

Tim Ward

-----Original Message-----
From: Boyang Chen <reluctanthero104@gmail.com>
Sent: 09 August 2019 23:31
To: users@kafka.apache.org
Subject: Re: How do I tell Kafka Streams not to repartition?

In case I'm not making myself clear, any operation that changes the record
key will result in repartition. Since you don't want that, you shall choose
to call groupByKey afterwards and aggregation will happen on `parent id`
level.

On Fri, Aug 9, 2019 at 3:27 PM Boyang Chen <reluctanthero104@gmail.com>
wrote:

> Hey Tim,
>
> I think the functionality you need is groupByKey() which avoids
> repartitioning, feel free to check it out here:
> https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#aggregating.
> Recommend you to read the whole thing but feel free just to search
> `groupByKey`.
>
> On Fri, Aug 9, 2019 at 7:14 AM Tim Ward <tim.ward@origamienergy.com>
> wrote:
>
>> I've got an input topic which is keyed by "parent ID". Each message
>> contains multiple items of data, each for a different "child ID".
>>
>> To process these items separately I flatMapValues() the stream to make a
>> new stream of the inner items of data, keyed by "child ID".
>>
>> Now, because I've changed the key, Kafka Streams thinks a repartition is
>> needed. But in fact it isn't, because all the inner items for a particular
>> "child ID" will be contained within messages keyed with the same "parent
>> ID".
>>
>> How do I tell Kafka Streams that there is no need to repartition in this
>> case, because all the data that should remain together in the same instance
>> of the application will do so without repartitioning? (I appreciate that
>> Streams can't know about the parent-child relationship unless I *do* tell
>> it in some way.)
>>
>> Tim Ward
>>
>> This email is from Origami Energy Limited. The contents of this email and
>> any attachment are confidential to the intended recipient(s). If you are
>> not an intended recipient: (i) do not use, disclose, distribute, copy or
>> publish this email or its contents; (ii) please contact Origami Energy
>> Limited immediately; and then (iii) delete this email. For more
>> information, our privacy policy is available here:
>> https://origamienergy.com/privacy-policy/. Origami Energy Limited
>> (company number 8619644) is a company registered in England with its
>> registered office at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
>>
>
This email is from Origami Energy Limited. The contents of this email and any attachment are
confidential to the intended recipient(s). If you are not an intended recipient: (i) do not
use, disclose, distribute, copy or publish this email or its contents; (ii) please contact
Origami Energy Limited immediately; and then (iii) delete this email. For more information,
our privacy policy is available here: https://origamienergy.com/privacy-policy/. Origami Energy
Limited (company number 8619644) is a company registered in England with its registered office
at Ashcombe Court, Woolsack Way, Godalming, GU7 1LQ.
Mime
View raw message