beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (BEAM-197) Incremental join
Date Mon, 15 May 2017 18:37:04 GMT

     [ https://issues.apache.org/jira/browse/BEAM-197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kenneth Knowles reassigned BEAM-197:
------------------------------------

    Assignee:     (was: Kenneth Knowles)

> Incremental join
> ----------------
>
>                 Key: BEAM-197
>                 URL: https://issues.apache.org/jira/browse/BEAM-197
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>            Reporter: Mark Shields
>
> Consider a co-group by key over the two (streaming) collections:
>  l : PCollection<KV<K, L>>
>  r : PCollection<KV<K, R>>
> Each processElement sees a K, Iterable<L> and Iterable<R>.
> If the underlying trigger only allows a single PaneInfo.Timing.ON_TIME pane then it is
trivial to calculate the traditional cross-product, including any of the inner/outer join
combinations should Iterable<L> or Iterable<R> be empty.
> However if the underlying trigger supports speculative (ie PaneInfo.Timing.EARLY) or
late (ie PaneInfo.Timing.LATE) panes then the corresponding speculative output panes are awkward
to compute.
> (left_already_seen ++ new_left)  X (right_already_seen ++ new_right)
>   ==
> (left_already_seen X right_already_seen) ++
> (new_left X right_already_seen) ++
> (left_already_seen X new_right) ++
> (new_left X new_right)
> Currently the barrier between 'already seen' and 'new' must be maintained for left and
right in per-window state. That suppresses some optimizations.
> This bug is for finding a cleaner way to express this combinator.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message