spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhijeet Kumar <>
Subject Join happening after watermark time
Date Thu, 06 Dec 2018 08:41:45 GMT
Hello Team,

I’m using watermark to join two streams as you can see below:

val order_wm = order_details.withWatermark("tstamp_trans", "20 seconds")
val invoice_wm = invoice_details.withWatermark("tstamp_trans", "20 seconds")
val join_df = order_wm
  .join(invoice_wm, order_wm.col("s_order_id") === invoice_wm.col("order_id"))

My understanding with the above code, it will keep each of the stream for 20 secs. After it
comes but, when I’m giving one stream now and the another after 20secs then also both are
getting joined. It seems like even after watermark got finished it’s holding the data in
memory. I tried even after 45 seconds and that was getting joined too.

This is creating confusion in my mind regarding watermark.

Thank you,
Abhijeet Kumar
View raw message