beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kennknowles <>
Subject [GitHub] beam pull request #3859: [BEAM-2884] Send portable protos for ParDo in Dataf...
Date Sun, 17 Sep 2017 22:37:33 GMT
GitHub user kennknowles opened a pull request:

    [BEAM-2884] Send portable protos for ParDo in DataflowRunner

    Follow this checklist to help us incorporate your contribution quickly and easily:
     - [x] Make sure there is a [JIRA issue](
filed for the change (usually before you start working on it).  Trivial changes like typos
do not require a JIRA issue.  Your pull request should address just this issue, without pulling
in other changes.
     - [x] Each commit in the pull request should have a meaningful subject line and body.
     - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`,
where you replace `BEAM-XXX` with the appropriate JIRA issue.
     - [x] Write a pull request description that is detailed enough to understand what the
pull request does, how, and why.
     - [x] Run `mvn clean verify` to make sure basic checks pass. A more thorough check will
be performed on your pull request automatically.
     - [x] If this contribution is large, please file an Apache [Individual Contributor License
    Instead of just sending Dataflow a Java-serialized `DoFnInfo`, send it as part of a portable
payload of the form: `PTransform` proto > `ParDoPayload` > `SdkFunctionSpec` > `FunctionSpec`
    A couple notes:
     - This is build on #3858 which is trivial. We could possibly do a worker dance for both
together, but it is nice for them to be separate commits.
     - The full `DoFnInfo` is still sent, as the Java SDK harness depends on many of its details
and will continue to do so for the foreseeable future. For example, the default "main" output,
which is not part of the portable model. The contents of `DoFnInfo` can be refined separately.
     - There are pieces left blank, such as the environment of the `SdkFunctionSpec`.
     - It may also be possible that Dataflow doesn't need quite so much context in the protos.
It only needs the pieces that the runner harness uses to construct an instruction graph for
the harness.

You can merge this pull request into a Git repository by running:

    $ git pull ParDoPayload

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3859
commit d8f03d7a3d749dcd279b35377e25dc8884dc5a64
Author: Kenneth Knowles <>
Date:   2017-09-16T22:16:56Z

    Move DoFnInfo to SDK util
    Previously, the DoFnInfo wrapped things just enough for Dataflow to execute a
    DoFn without much context. The Java SDK harness has the same need, and relies
    on DoFnInfo. Effectively, DoFnInfo is the UDF that the Java SDK harness

commit 478adfab4d94cf14826a107766b50a2c31ca5cc1
Author: Kenneth Knowles <>
Date:   2017-09-16T22:26:49Z

    Send ParDoPayload to Dataflow instead of just DoFnInfo



View raw message