spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: [SS] full outer stream-stream join
Date Mon, 23 Nov 2020 07:31:33 GMT
Adding rationalization here, my request for raising the thead to dev
mailing list is, to figure out possible reasons not having full outer join
at the moment when adding left/right outer join.

This is rather historical knowledge, so I have no idea about this. Most
likely a limited number of folks could answer and I hope we could get some
historical information.

Note that I don't object the change. Just wanted to make clear we don't
miss something.

On Sat, Nov 21, 2020 at 4:15 AM real-cheng-su <chengsu@fb.com.invalid>
wrote:

> Hi,
>
> Stream-stream join in spark structured streaming right now supports INNER,
> LEFT OUTER, RIGHT OUTER and LEFT SEMI join type. But it does not support
> FULL OUTER join and we are working on to add it in
> https://github.com/apache/spark/pull/30395 .
>
> Given LEFT OUTER and RIGHT OUTER stream-stream join is supported, the code
> needed for FULL OUTER join is actually quite straightforward:
>
> * For left side input row, check if there's a match on right side state
> store. if there's a match, output the joined row, o.w. output nothing. Put
> the row in left side state store.
> * For right side input row, check if there's a match on left side state
> store. if there's a match, output the joined row, o.w. output nothing. Put
> the row in right side state store.
> * State store eviction: evict rows from left/right side state store below
> watermark, and output rows never matched before (a combination of left
> outer
> and right outer join).
>
> Given FULL OUTER join consumes same amount of space in state store,
> compared
> with INNER/LEFT OUTER/RIGH OUTER join, and pretty easy to add. I don’t see
> any issues from system perspective that FULL OUTER join should not be
> added.
>
> I am wondering is there any major blocker to add FULL OUTER stream-stream
> join? Asking in dev mailing list in case we miss anything besides PR review
> participation, thanks.
>
> Cheng Su
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message