beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-604) Use Watermark Check Streaming Job Finish in DataflowPipelineJob
Date Fri, 02 Sep 2016 20:44:20 GMT


ASF GitHub Bot commented on BEAM-604:

GitHub user markflyhigh opened a pull request:

    [BEAM-604] Use Watermark to Finish Streaming Job in TestDataflowRunner

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
     - [x] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](
     - Add checkMaxWatermark() function in TestDataflowRunner, so that when testing on streaming
pipeline with bounded input, the job can be canceled as soon as all watermark reach to max
value (by default is -2). Then, verification steps can be executed.
     - Add WindowedWordCountIT as a basic example of testing on streaming job.
     - Add non-terminated check before canceling steaming job.
     - Create verifier for WindowedWordCountIT. (BigQuery verifier)

You can merge this pull request into a Git repository by running:

    $ git pull streaming-wait-until-max-watermark

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #916
commit bb8a6f8a360a8f263fe0ad625ab4159d645c42ab
Author: Mark Liu <>
Date:   2016-09-02T20:22:37Z

    [BEAM-604] Use Watermark to Finish Streaming Job in TestDataflowRunner


> Use Watermark Check Streaming Job Finish in DataflowPipelineJob
> ---------------------------------------------------------------
>                 Key: BEAM-604
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>            Reporter: Mark Liu
>            Assignee: Mark Liu
>            Priority: Minor
> Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner
can't handle this case. Need to update TestDataflowRunner so that streaming integration test
such as WindowedWordCountIT can run with it.
> Implementation:
> Query watermark of each step and wait until all watermarks set to MAX then cancel the
> Update:
> Suggesting by [], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish.
Thus, all dataflow streaming jobs with bounded input will take advantage of this change and
are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep
simple and free from handling batch and streaming two cases.

This message was sent by Atlassian JIRA

View raw message