beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-1925) Make DoFn invocation logic of Python SDK more extensible
Date Mon, 08 May 2017 20:47:04 GMT


ASF GitHub Bot commented on BEAM-1925:

GitHub user sb2nov opened a pull request:

    [BEAM-1925] validate DoFn at pipeline creation time

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`.
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](
    R: @robertwb PTAL 
    Same changes as without the new file creation.
    The performance seems identical to master on the benchmark gist.

You can merge this pull request into a Git repository by running:

    $ git pull BEAM-1925-validate-dofn-2

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2963
commit 6ee4c243b7f1a942f66f4f6f5fb19b32fb334a7c
Author: Sourabh Bajaj <>
Date:   2017-05-08T20:33:15Z

    [BEAM-1925] validate DoFn at pipeline creation time


> Make DoFn invocation logic of Python SDK more extensible
> --------------------------------------------------------
>                 Key: BEAM-1925
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>            Assignee: Chamikara Jayalath
> DoFn invocation logic of Python SDK is currently in DoFnRunner class.
> At initialization of this, we parse a DoFn and create local state. We use this state
when invoking DoFn methods process, start_bundle, and finish_bundle. For example, we store
a list of  ArgPlaceholder objects within the state of DoFnRunner to facilitate invocation
of process method.
> We will need to extend this functionality when adding new features to DoFn class (for
example to support Splittable DoFn [1]). So I think it's good to refactor this code to be
more extensible. 
> I think a good approach for this is to add DoFnInvoker and DoFnSignature classes similar
to Java SDK [2].
> In this approach:
> A DoFnSignature captures the signature of a DoFn including methods and arguments.
> A DoFnInvoker implements a particular way DoFn methods will be executed (initially we'll
have simple and per-window invokers [3]).
> A runner uses DoFnRunner to execute methods of a given DoFn. At initialization, DoFnRunner
crates a DoFnSignature and a DoFnInvoker for the given DoFn.
> DoFnSignature and DoFnInvoker methods will be used by SplittableDoFn implementation as
> [1]
> [2]
> [3]

This message was sent by Atlassian JIRA

View raw message