kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Roesler <j...@confluent.io>
Subject Re: Kafka Streams - Merge vs. Join
Date Thu, 09 Aug 2018 20:43:23 GMT
Hi John,

Sorry for the confusion! I just noticed that we failed to document the
merge operator. I've created
https://issues.apache.org/jira/browse/KAFKA-7269 to fix it.

But in the mean time,
* merge: interleave the records from two streams to produce one collated
stream
* join: compute a new stream by fusing together records from the two inputs
by key

For example:
input-1:
(A, 1)
(B, 2)

input-2:
(A, 500)
(C, 60)

join( (l,r) -> new KeyValue(l, r) ):    // simplified API
(A, (1, 500) )
(B, (2, null) )
(C, (null, 60) )

merge:
(A, 1)
(A, 500)
(B, 2)
(C, 60)


Does that make sense?
-John Roesler

On Thu, Aug 9, 2018 at 2:13 PM <jheller@ups.com.invalid> wrote:

> Hi All,
>
>
>
>               I am a little confused on the difference between the
> KStreamBuilder merge() function and doing a KStream-to-KStream Join
> operation. I understand the difference between Inner, Left and Outer joins,
> but I don't understand exactly what the difference is between the two. It
> appears to me that both ways would merge two streams into a single stream,
> but the joins do have the ability to remove duplicate data. Is that the
> only difference? Also, on a side note, I am really clueless as to what the
> difference between Windowed and Windowless means when referring to the
> joins.
>
>
>
>               Any help would be greatly appreciated. Thank you.
>
>
>
> John Heller
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message