beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2950) Provide implicit access to State
Date Wed, 13 Sep 2017 01:43:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16164021#comment-16164021
] 

Kenneth Knowles commented on BEAM-2950:
---------------------------------------

-1

We considered this and opted not to do it. Here are the criteria considered for the user-facing
State API: https://s.apache.org/beam-state#heading=h.j7e7f226dsrr and another criteria we
left off the document was readability of end-user code.

So I think this proposal has the following flaws:

# To start with, it isn't very useful. PCollectionView.get() makes it easier to wrap a DoFn
with side inputs to produce a composite with side inputs, because side inputs are passed in
from outside the composite, and PCollectionViews are globally rooted values. StateSpec is
not passed in, but declared within the DoFn, rooted in the primitive ParDo, so there's no
extensibility gained. There is no "double wiring" problem like we have with side inputs. It
doesn't make any sense for a composite to have state.
# Direct access at first appears "more intuitive" because to a newcomer it "looks like" normal
field access. But in fact it is nothing like normal field access so this intuition is misleading
and should not be encouraged. So it is actually less readable because your intuitive reading
is wrong.
# This design would miss the validation aspect. One way it is different than normal mutatey
programming is that there are many places it is illegal to reference state, such as StartBundle/FinishBundle,
or passing to another object. This proposal would turn those into dynamic failures at best,
or in the worst case data corruption (runner fails to catch illegal access, and permits some
thread-global context to leak)
# It is actually mandatory that we are always able to detect state, as it is essentially a
different primitive (VanillaParDo, SplittableParDo, and StatefulParDo are executed totally
differently, even in the mathematical sense)
# As for timers, we need to associate the ID with a method. An alternative is serialized callbacks
so if you include timers you need to include a full design for that.
# A runner can't automatically prefetch, because it doesn't know which state is used by which
methods.
# Magic by mutating stuff into place is just less readable / more error prone.

There's a very strong burden of proof / design doc / dev list consensus to move in this direction.

> Provide implicit access to State
> --------------------------------
>
>                 Key: BEAM-2950
>                 URL: https://issues.apache.org/jira/browse/BEAM-2950
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Kenneth Knowles
>
> https://github.com/apache/beam/pull/3814 provides implicit access to side inputs (without
a ProcessContext). Luke suggests to have the same for State and, I suppose, timers. We could
also have it for PipelineOptions: in any given user code invocation, these are all unambiguous.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message