kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Mittal <sjmit...@gmail.com>
Subject What timestamp is used by streams when doing windowed joins
Date Sat, 07 Dec 2019 04:20:25 GMT
Hi,
I have noticed some issues when doing stream to stream windowed joins.
Looks like my joined stream does not include all the records.

Say I am doing join like this:
stream1.join(
            stream2,
            (lv, rv) -> ...,
            JoinWindows.of(Duration.ofMinutes(5)),
           ....)
What I have checked from the docs is that it will join 2 records within the
specified window.
However its not clear as what time it would take for each record?
Would it be
1.  event-time or
2. processing-time or
3. ingestion-time

I am right now using default configuration for
log.message.timestamp.type = CreateTime and default.timestamp.extractor

>From the docs I gather is that in default case it uses event-time.
So does it mean that there has to be a timestamp field in the record which
is to be extracted by custom timestamp extractor?

Also in downstream when streams application actually writes (produces) new
record types, do we need to provide timestamp extractor for all such record
types
so the next process in the pipeline can pick up the timestamp to do the
windowed operations?

Also when and how processing time is used at all by streams application?

Finally say I don't want to worry about if timestamp is set by the
producers, is it better to simply set
log.message.timestamp.type =  LogAppendTime

Thanks
Sachin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message