spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From real-cheng-su <chen...@fb.com.INVALID>
Subject [SS] full outer stream-stream join
Date Fri, 20 Nov 2020 19:15:30 GMT
Hi,
 
Stream-stream join in spark structured streaming right now supports INNER,
LEFT OUTER, RIGHT OUTER and LEFT SEMI join type. But it does not support
FULL OUTER join and we are working on to add it in
https://github.com/apache/spark/pull/30395 .
 
Given LEFT OUTER and RIGHT OUTER stream-stream join is supported, the code
needed for FULL OUTER join is actually quite straightforward:

* For left side input row, check if there's a match on right side state
store. if there's a match, output the joined row, o.w. output nothing. Put
the row in left side state store.
* For right side input row, check if there's a match on left side state
store. if there's a match, output the joined row, o.w. output nothing. Put
the row in right side state store.
* State store eviction: evict rows from left/right side state store below
watermark, and output rows never matched before (a combination of left outer
and right outer join).

Given FULL OUTER join consumes same amount of space in state store, compared
with INNER/LEFT OUTER/RIGH OUTER join, and pretty easy to add. I don’t see
any issues from system perspective that FULL OUTER join should not be added.

I am wondering is there any major blocker to add FULL OUTER stream-stream
join? Asking in dev mailing list in case we miss anything besides PR review
participation, thanks.

Cheng Su



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message