kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Merging Two KTables
Date Tue, 23 Jan 2018 21:30:06 GMT
Hi Sameer, Dmitry:

Just a side note that for KStream.merge(), we do not guarantee timestamp
ordering, so the resulted KStream may likely have out-of-ordering regarding
the timestamps. If you do want to have some merging operations that
respects the timestamps of the input streams because you believe they are
well aligned, you need to either assume that all input streams do not have
any out-of-ordering data, so some online merge-sort can be applied, or you
assume the out of time range has some upper bound in practice so you can
bookkeep and wait. As said, there is no golden standard rules for merging
and hence we leave it to users to customize in the "process(Processor)
API", or use "merge" if they are tolerable about timestamp ordering in the
resulted stream.


Guozhang


On Tue, Jan 23, 2018 at 1:12 PM, Matthias J. Sax <matthias@confluent.io>
wrote:

> Well. That is one possibility I guess. But some other way might be to
> "merge both values" into a single one... There is no "straight forward"
> best semantics IMHO.
>
> If you really need this, you can build it via Processor API.
>
>
> -Matthias
>
>
> On 1/23/18 7:46 AM, Dmitry Minkovsky wrote:
> >> Merging two tables does not make too much sense because each table might
> > contain an entry for the same key. So it's unclear, which of both values
> > the merged table should contain.
> >
> > Which of both values should the table contain? Seems straightforward: it
> > should contain the value with the highest timestamp, with
> non-deterministic
> > behavior when two timestamps are the same.
> >
> >
> > ср, 26 июля 2017 г. в 9:42, Matthias J. Sax <matthias@confluent.io>:
> >
> >> Merging two tables does not make too much sense because each table might
> >> contain an entry for the same key. So it's unclear, which of both values
> >> the merged table should contain.
> >>
> >> KTable.toStream() is just a semantic change and has no runtime overhead.
> >>
> >> -Matthias
> >>
> >>
> >> On 7/26/17 1:34 PM, Sameer Kumar wrote:
> >>> Hi,
> >>>
> >>> Is there a way I can merge two KTables just like I have in KStreams
> api.
> >>> KBuilder.merge().
> >>>
> >>> I understand I can use KTable.toStream(), if I choose to use it, is
> there
> >>> any performance cost associated with this conversion or is it just a
> API
> >>> conversion.
> >>>
> >>> -Sameer.
> >>>
> >>
> >>
> >
>
>


-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message