beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-564) Update source framework so that remaining and consumed number of split points can be reported
Date Wed, 01 Feb 2017 19:53:51 GMT

    [ https://issues.apache.org/jira/browse/BEAM-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848863#comment-15848863
] 

ASF GitHub Bot commented on BEAM-564:
-------------------------------------

GitHub user chamikaramj opened a pull request:

    https://github.com/apache/beam/pull/1890

    [BEAM-564] Updates Python source API to allow reporting consumed and remaining number
of split points

    With this update Python BoundedSource/RangeTracker API can report consumed and remaining
number of split points while performing a source read operations. Java SDK source API already
supports reporting these signals.
    
    These signals can be used by runner implementations, for example, to perform scaling decisions.
    
    This provides a slightly simplified API compared to previous PR #881.
    
    Main differences compared to #881 are following.
    
    (1) set_done()/done() methods were removed from the RangeTracker interface.
    
    Downside is that RangeTracker will be unable to provide the signal that all records have
been consumed. I think this signal is unnecessary since a runner can detect that anyways since
the reader loop of the source ends at that point.
    
    (2)
    Callback between BoundedSource and RangeTracker was changed from reporting remaining number
of split points to reporting unclaimed number of split points. This makes the implementation
of the callback simpler for source authors.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/chamikaramj/beam limited_parallelism_updated

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/1890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1890
    
----
commit 8c5761f1647b11c9b60919bdecfa2ac77f4e491d
Author: Chamikara Jayalath <chamikara@google.com>
Date:   2017-01-05T03:10:09Z

    Updates Python SDK source API so that sources can report limited parallelism signals.
    
    With this update Python BoundedSource/RangeTracker API can report consumed and remaining
number of split points while performing a source read operations, similar to Java SDK sources.
    
    These signals can be used by runner implementations, for example, to perform scaling decisions.

----


> Update source framework so that remaining and consumed number of split points can be
reported
> ---------------------------------------------------------------------------------------------
>
>                 Key: BEAM-564
>                 URL: https://issues.apache.org/jira/browse/BEAM-564
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>            Assignee: Chamikara Jayalath
>
> We have to update Python SDK source framework so that sources can report consumed and
remaining number of split points. Runners can use this information to determine how many times
a given source can be split into and parallelize reading accordingly.
> Corresponding API for JAVA SDK is here:
> https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java#L258



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message