beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-65) SplittableDoFn
Date Thu, 04 Aug 2016 18:16:20 GMT


Eugene Kirpichov commented on BEAM-65:

Proposal announced on Beam dev mailing list

The proposal itself is at

> SplittableDoFn
> --------------
>                 Key: BEAM-65
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Daniel Halperin
>            Assignee: Eugene Kirpichov
>            Priority: Minor
> SplittableDoFn is a proposed enhancement for "dynamically splittable work" to the Beam
> Among other things, it would allow a unified implementation of bounded/unbounded sources
with dynamic work rebalancing and the ability to express multiple scalable steps (e.g., global
expansion -> file sizing & parsing -> splitting files into independently-processable
blocks) via composition rather than inheritance.
> This would make it much easier to implement many types of sources, to modify and reuse
existing sources. Also, it would improve scalability of the Beam model by moving things like
splitting a source from the control plane (where it is today -- glob -> List<FileBasedSource>
sent over service APIs) into the data plane (PCollection<Glob> -> PCollection<FileName>
-> ...).

This message was sent by Atlassian JIRA

View raw message