beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Groh (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-1372) OutputTimeFn and Accumulating Mode is Confusing
Date Wed, 01 Feb 2017 22:20:51 GMT
Thomas Groh created BEAM-1372:
---------------------------------

             Summary: OutputTimeFn and Accumulating Mode is Confusing
                 Key: BEAM-1372
                 URL: https://issues.apache.org/jira/browse/BEAM-1372
             Project: Beam
          Issue Type: Bug
          Components: beam-model
            Reporter: Thomas Groh


See [here| https://github.com/tgroh/beam/commit/2238df334a368ce1a41e14ee616be954c5430c73]
for an example pipeline

The Timestamp used by a pane does not change based on the accumulation mode of the windowing
strategy - as a result, elements which have associated timestamps can not be safely reassigned
to those timestamps after a GroupByKey if more than one pane could have been produced, regardless
of the {{OutputTimeFn}}. The first example pipeline demonstrates two PCollections where the
elements within the last PCollection cannot be reassigned to their timestamps, even though
we are using {{OutputTimeFn#outputAtEarliestInputTimestamp}} and 

When using a more complex windowing strategy like sessions, this is even more confusing -
a session that spans more than one of the downstream windows but that is produced in multiple
panes will over time be assigned to later and later windows as more panes are produced - thus,
a pipeline that produces session windows and wishes to group the sessions by the point at
which they started must only ever produce a single pane per session.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message