From issues-return-180948-apmail-flink-issues-archive=flink.apache.org@flink.apache.org Mon Jul 30 09:21:06 2018 Return-Path: X-Original-To: apmail-flink-issues-archive@minotaur.apache.org Delivered-To: apmail-flink-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EE8C187EE for ; Mon, 30 Jul 2018 09:21:06 +0000 (UTC) Received: (qmail 30805 invoked by uid 500); 30 Jul 2018 09:21:06 -0000 Delivered-To: apmail-flink-issues-archive@flink.apache.org Received: (qmail 30770 invoked by uid 500); 30 Jul 2018 09:21:06 -0000 Mailing-List: contact issues-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list issues@flink.apache.org Received: (qmail 30752 invoked by uid 99); 30 Jul 2018 09:21:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Jul 2018 09:21:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A2147C0351 for ; Mon, 30 Jul 2018 09:21:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id id5xJYLt9cJ7 for ; Mon, 30 Jul 2018 09:21:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 127575F58C for ; Mon, 30 Jul 2018 09:21:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id DC927E2602 for ; Mon, 30 Jul 2018 09:21:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2E7132776D for ; Mon, 30 Jul 2018 09:21:01 +0000 (UTC) Date: Mon, 30 Jul 2018 09:21:01 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@flink.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (FLINK-9877) Add separate docs page for different join types in DataStream API MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/FLINK-9877?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1656= 1681#comment-16561681 ]=20 ASF GitHub Bot commented on FLINK-9877: --------------------------------------- zentol commented on a change in pull request #6407: [FLINK-9877][docs] Add = documentation page for different datastream joins URL: https://github.com/apache/flink/pull/6407#discussion_r206065064 =20 =20 ########## File path: docs/dev/stream/operators/joining.md ########## @@ -0,0 +1,284 @@ +--- +title: "Joining" +nav-id: streaming_joins +nav-show_overview: true +nav-parent_id: streaming_operators +nav-pos: 11 +--- + + +* toc +{:toc} + +# Window Join +A window join joins the elements of two streams that share a common key an= d lie in the same window. These windows can be defined by using a [window a= ssigner]({{ site.baseurl}}/dev/stream/operators/windows.html#window-assigne= rs) and are evaluated on elements from both of the streams. + +The elements from both sides are then passed to a user-defined `JoinFuncti= on` or `FlatJoinFunction` where the user can emit results that meet the joi= n criteria. + +The general usage can be summarized as follows: + +{% highlight java %} +stream.join(otherStream) + .where() + .equalTo() + .window() + .apply() +{% endhighlight %} + +Some notes on semantics: +- The creation of pairwise combinations of elements of the two streams beh= aves like an inner-join, meaning elements from one stream will not be emitt= ed if they don't have a corresponding element from the other stream to be j= oined with. +- Those elements that do get joined will have as their timestamp the large= st timestamp that still lies in the respective window. For example a window= with `[5, 10)` as its boundaries would result in the joined elements havin= g 9 as their timestamp. + +In the following section we are going to give an overview over how differe= nt kinds of window joins behave using some exemplary scenarios. + +## Tumbling Window Join +When performing a tumbling window join, all elements with a common key and= a common tumbling window are joined as pairwise combinations and passed on= to a `JoinFunction` or `FlatJoinFunction`. Because this behaves like an in= ner join, elements of one stream that do not have elements from another str= eam in their tumbling window are not emitted! + + + +As illustrated in the figure, we define a tumbling window with the size of= 2 milliseconds, which results in windows of the form `[0,1], [2,3], ...`. = The image shows the pairwise combinations of all elements in each window wh= ich will be passed on to the `JoinFunction`. Note that in the tumbling wind= ow `[6,7]` nothing is emitted because no elements exist in the green stream= to be joined with the orange elements =E2=91=A5 and =E2=91=A6. + +
+
+{% highlight java %} +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTim= eWindows; +import org.apache.flink.streaming.api.windowing.time.Time; +=20 +... + +DataStream orangeStream =3D ... +DataStream greenStream =3D ... + +orangeStream.join(greenStream) + .where() + .equalTo() + .window(TumblingEventTimeWindows.of(Time.seconds(2))) + .apply (new JoinFunction () { + @Override + public String join(Integer first, Integer second) { + return first + "," + second; + } + }); + {% endhighlight %} +
+
+ +{% highlight scala %} +import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTime= Windows; +import org.apache.flink.streaming.api.windowing.time.Time; + +... + +val orangeStream: DataStream[Integer] =3D ... +val greenStream: DataStream[Integer] =3D ... + +orangeStream.join(greenStream) + .where(elem =3D> /* select key */) + .equalTo(elem =3D> /* select key */) + .window(TumblingEventTimeWindows.of(Time.milliseconds(2))) + .apply { (e1, e2) =3D> e1 + "," + e2 } + {% endhighlight %} + +
+
+ +## Sliding Window Join +When performing a sliding window join, all elements with a common key and = common sliding window are joined are pairwise combinations and passed on to= the `JoinFunction` or `FlatJoinFunction`. Elements of one stream that do n= ot have elements from the other stream in the current sliding window are no= t emitted! Note that some elements might be joined in one sliding window bu= t not in another! + + + +In this example we are using sliding windows with a size of two millisecon= ds and slide them by one millisecond, resulting in the sliding windows `[-1= , 0],[0,1],[1,2],[2,3], =E2=80=A6`. The= joined elements below the x-axis are the ones that are passed to the `Join= Function` for each sliding window. Here you can also see how for example th= e orange =E2=91=A1 is joined with the green =E2=91=A2 in the window `[2,3]`= , but is not joined with anything in the window `[1,2]`. + +
+
+ +{% highlight java %} +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.streaming.api.windowing.assigners.SlidingEventTime= Windows; +import org.apache.flink.streaming.api.windowing.time.Time; + +... + +DataStream orangeStream =3D ... +DataStream greenStream =3D ... + +orangeStream.join(greenStream) + .where() + .equalTo() + .window(SlidingEventTimeWindows.of(Time.milliseconds(2) /* size */, Ti= me.milliseconds(1) /* slide */)) + .apply (new JoinFunction () { =20 Review comment: remove space before `(` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. =20 For queries about this service, please contact Infrastructure at: users@infra.apache.org > Add separate docs page for different join types in DataStream API > ----------------------------------------------------------------- > > Key: FLINK-9877 > URL: https://issues.apache.org/jira/browse/FLINK-9877 > Project: Flink > Issue Type: Sub-task > Components: Documentation > Reporter: Florian Schmidt > Assignee: Florian Schmidt > Priority: Major > Labels: pull-request-available > Fix For: 1.6.0 > > > https://github.com/apache/flink/pull/6407 -- This message was sent by Atlassian JIRA (v7.6.3#76005)