flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gyfora <...@git.apache.org>
Subject [GitHub] flink pull request: StreamWindow abstraction + modular window comp...
Date Fri, 13 Feb 2015 19:47:17 GMT
GitHub user gyfora opened a pull request:

    https://github.com/apache/flink/pull/395

    StreamWindow abstraction + modular window computations

    This PR introduces a rework to the current windowing semantics, by major API and Runtime
improvements.
    
    The core new abstraction is the StreamWindow which encapsulates the contents of a window
in the data stream. Now the WindowedDataStream symbolises transformations on a DataStream
of StreamWindows. This allows the users to properly collect, and work with the results of
the window transformations.
    
    API changes:
    
    Previously the call `ds.window(...).groupBy(...).reduce(...)` would have returned a stream
of reduced values by key in the given window. There was no way of properly obtaining all the
(key, value) pairs from the window reduce calls just the flattened windows. This also made
it impossible to apply more than one transformations on the same window without having to
re-descritize it.
    
    The new api introduces several changes here:
    
    `ds.window(...).groupBy(...).reduceWindow(...)` returns a WindowedDataStream which than
can be further transformed by applying transformations on the resulting window. The user can
also call `.flatten()` or `.getDiscretizedStream()` to obtain the DataStream of the flattened
values or the DataStream of StreamWindows.
    
    The reduce and reduceGroup functions have been renamed to reduceWindow and mapWindow respectively
to emphasise the functionality.
    
    Runtime changes:
    
    The windowing runtime has been completely reworked to make the components modular for
future improvements:
    
    DataStream -> StreamDiscretizer -> (PreAggregator) -> WindowPartitioner ->
Transformation (reduce/map) -> WindowMerge 
    
    This allows easier special-casing for specific window types for maximal performance, such
as applying special preaggregators or parallel stream discretisation for global windows (time)
as well.
    
    Further tasks after merge:
    -Add pre-aggregators for time windows
    -Add pre-aggregator for sliding count windows
    -Improve test coverage for all pre-implemented policies
    -Further optimise partitioning-reduce parallelism
    
    I will add one last commit which will update the javadocs and the streaming guide.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mbalassi/flink window

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/395.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #395
    
----
commit f41a237ee05dd13caf6ddb805a917fc63d20c1a8
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-04T16:14:51Z

    [streaming] StreamWindow abstraction added with typeinfo and tests

commit 6b7243bda432e7cd7d0d3dff37871a67f7c3fbc0
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-04T16:26:16Z

    [FLINK-1176] [streaming] Added invokables for modular windowing tranformations

commit 3df8b613801c2dd7da12a7467acabd21cac14c8e
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-04T16:27:34Z

    [FLINK-1176] [streaming] WindowedDataStream rework for new windowing runtime

commit df54364fe3bfa38a4eaafd5a7e774087a9996aaf
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-08T12:09:34Z

    [streaming] StreamDiscretizer rework to support only 1 eviction and trigger for robustness
+ test cleanup

commit f14936ea62ebc999c765f536d79ba91f3709256b
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-08T12:10:00Z

    [streaming] Integration test added for windowed operations

commit fca43c132f6988440d179f5922f3c150dc2dc9c9
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-08T12:46:47Z

    [streaming] Streaming scala api fix for window changes + example cleanup

commit 8109c85a987fcb312d2575962abcf4b6f971b122
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-08T21:59:22Z

    [streaming] GroupedTimeDiscretizer added for lighter time policy thread management

commit 8df13b8d9f117633ac4c7537b3c03d8d2f4bbf40
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-10T11:21:09Z

    [streaming] WindowBuffer interface added for preaggregator logic + simple tumbling prereducer

commit bae76b7b310d313d2296d91e910cbf5d0d13c344
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-11T11:57:06Z

    [streaming] Reimplemented StreamWindow.split to avoid dependency issues

commit 2a88e35cd59a9f8cdc2e460f6ec89c03e430308d
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-12T19:45:34Z

    [streaming] TumblingGroupedPreReducer added + new tests for windowing

commit 0dba2c5cfc1bd9a8dd51a0ac630fe0a0ac3f9f5d
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-13T10:03:08Z

    [streaming] WindowMapFunction added + Streaming package structure cleanup

commit 592a6ec90ade022ef4758ef5cc678d17b33a7afd
Author: Gyula Fora <gyfora@apache.org>
Date:   2015-02-13T18:57:55Z

    [FLINK-1539] [streaming] Remove calls to uninitalized runtimecontexts

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message