kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@apache.org>
Subject Re: KafkaStreams GroupBy with new key. Can I skip repartition?
Date Sun, 01 Mar 2020 19:16:04 GMT
Hash: SHA512

I don't think that KIP-221 addressed the discussed use case.

KIP-221 allows to force a repartitioning manually, while the use case
describe in the original email was to suppress/skip a repartitioning ste

The issue to avoid unnecessary repartitioning came up a few time
already and I personally believe it's worth to close this gap. But we
would need to do a KIP to introduce some API to allow user to tell
Kafka Streams that repartitioning is not necessary.

In Apache Flink, there is an operator called
`reinterpretAsKeyedStream`. We could introduce something similar.

- -Matthias

On 3/1/20 4:43 AM, John Roesler wrote:
> Hi all,
> The KIP is accepted and implemented already, but is blocked on
> code review: https://github.com/apache/kafka/pull/7170
> A quick note on the lack of recent progress... It's completely our
> fault, the reviews fell by the wayside during the 2.5.0 release
> cycle, and we haven't gotten back to it. The contributor, Levani,
> has been exceptionally patient with us and continually kept the PR
> up-to-date and mergeable since then.
> If you'd like to help get it across the line, Murilo, maybe you can
> give it a review?
> Thanks, John
> On Sat, Feb 29, 2020, at 20:52, Guozhang Wang wrote:
>> It is in progress, but I was not the main reviewer of that ticket
>> so I cannot say for sure. I saw the last update is on Jan/2019 so
>> maybe it's a bit loose now.. If you want to pick it up and revive
>> the KIP completion feel free to do so :)
>> Guozhang
>> On Fri, Feb 28, 2020 at 5:54 PM Murilo Tavares
>> <murilofla@gmail.com> wrote:
>>> Guozhang The ticket definitely describes what I’m trying to
>>> achieve. And should I be hopeful with the fact it’s in
>>> progress? :) Thanks for pointing that out. Murilo
>>> On Fri, Feb 28, 2020 at 2:57 PM Guozhang Wang
>>> <wangguoz@gmail.com> wrote:
>>>> Hi Murilo,
>>>> Would this be helping your case?
>>>> https://issues.apache.org/jira/browse/KAFKA-4835
>>>> Guozhang
>>>> On Fri, Feb 28, 2020 at 7:01 AM Murilo Tavares
>>>> <murilofla@gmail.com> wrote:
>>>>> Hi I am currently doing a simple KTable
>>>>> groupby().aggregate() in
>>>> KafkaStreams.
>>>>> In the groupBy I do need to select a new key, but I know
>>>>> for sure that
>>>> the
>>>>> new key would still fall in the same partition. Because of
>>>>> this, I
>>>> believe
>>>>> the repartition would not be necessary, but my question is:
>>>>> is it
>>>> possible
>>>>> to do a groupBy, changing the key, and tell KafkaStreams to
>>>>> not create
>>>> the
>>>>> repartition topic? Thanks Murilo
>>>> -- -- Guozhang
>> -- -- Guozhang


View raw message