kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: How does KStream transform performs repartitioning?
Date Tue, 22 May 2018 16:20:54 GMT
Hello Edmondo,

If you have a join operator following the transform() operator, then the
joining streams will be sent to a repartition topic, and the join
operator's hosted thread will then read from that repartition topic. This
is for "re-shuffling" the streams since the key of the stream record may
have changed after transform. It should not cause joining to not tick. But
I'd suggest you check your repartition topic and see if it is correctly
created and populated (note that if you add such an operator into your app,
your topology may have been changed largely, which you can check by calling
Topology#describe() to see the difference, such that you may not be able to
do a in-place upgrade any more, but have to restart it after using the
streams reset tool).


On Tue, May 22, 2018 at 3:12 AM, Edmondo Porcu <edmondo.porcu@gmail.com>

> Hello users,
> we are performing a Transform so that out of a larger message we emit a new
> output record only if that specific field has changed.
> Since we introduced that to reduce the number of output records, our final
> Kstream - KStream windowed join is not ticking anymore, although the window
> is large enough (1year). Our both streams have 3 partitions.
> We noticed that in the KStreams documentation, the following sentence
> appears:
>      * Transforming records might result in an internal data redistribution
> if a key based operator (like an aggregation
>      * or join) is applied to the result {@code KStream}.
>      * (cf. {@link #transformValues(ValueTransformerSupplier, String...)})
> What does this redistribution mean and how does it work? Why have we lost
> joins result?
> Thank you

-- Guozhang

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message