beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dkulp <>
Subject [GitHub] incubator-beam pull request #1025: [BEAM-674] Gridfs Source refactoring
Date Thu, 29 Sep 2016 16:36:40 GMT
GitHub user dkulp opened a pull request:

    [BEAM-674] Gridfs Source refactoring

    Refactor of the GridFS based Source based on feedback from @jkff 
    BoundedSource is now a source of ObjectID's and a separate DoFn is used to convert/parse
the GridFSDBFile into usable chunks.   
    Testcase for splitting added.
    Variables not needed by the Source are pulled out and stuck on the transform instead.
    Optimized the non-split case a bit by not querying all the ObjectIds up front.  
    Optimize unit tests by setting up test data per class instead of per test.

You can merge this pull request into a Git repository by running:

    $ git pull gridfs-t2

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1025
commit 5aad971bcd1d32ba06cec9d4870e7aa9e9dc17f5
Author: Daniel Kulp <>
Date:   2016-09-29T02:44:37Z

    Split BoundedSource into a BoundedSource<ObjectID> and a DoFn<...>

commit 2fc219cdd33e89d65d457dd3767bd378ff1111c0
Author: Daniel Kulp <>
Date:   2016-09-29T13:03:31Z

    Optimize reading for non-split case

commit e58fc61868988cc40c325d913fca37b26e3db99c
Author: Daniel Kulp <>
Date:   2016-09-29T13:18:17Z

    Use objectId timestamp

commit ed73d77b21651d6ef1d8cf2892dc267794d52d10
Author: Daniel Kulp <>
Date:   2016-09-29T13:57:44Z

    Pull parser out of BoundedSource, add maxSkew

commit 277667527cf0a23704b3ae3d05b2c8e2c2bcea3c
Author: Daniel Kulp <>
Date:   2016-09-29T14:48:42Z

    Add test case for the split

commit db30aabac4629ae167e4ede73de79257b4a93336
Author: Daniel Kulp <>
Date:   2016-09-29T15:00:44Z

    Don't need the generic on the Source and Reader

commit 1cdb2ce716b7e020c5306494b414b5bb136abb24
Author: Daniel Kulp <>
Date:   2016-09-29T16:29:51Z

    Rename maxSkew to allowedTimestampSkew to match other DoFn's


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message