flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aljoscha <...@git.apache.org>
Subject [GitHub] flink pull request #3191: [FLINK-5529] [FLINK-4752] [docs] Improve / extends...
Date Wed, 25 Jan 2017 17:32:13 GMT
Github user aljoscha commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3191#discussion_r97833288
  
    --- Diff: docs/dev/windows.md ---
    @@ -758,8 +831,33 @@ input
     val input: DataStream[T] = ...
     
     input
    -    .windowAll(<window assigner>)
    +    .keyBy(<key selector>)
    +    .window(<window assigner>)
    +    .allowedLateness(<time>)
         .<windowed transformation>(<window function>)
     {% endhighlight %}
     </div>
     </div>
    +
    +<span class="label label-info">Note</span> When using the `GlobalWindows`
window assigner no
    +data is ever considered late because the end timestamp of the global window is `Long.MAX_VALUE`.
    +
    +### Late elements considerations
    +
    +When specifying an allowed lateness greater than 0, the window along with its content
is kept after the watermark passes
    +the end of the window. In these cases, when a late but not dropped element arrives, it
will trigger another firing for the 
    +window. These firings are called `late firings`, as they are triggered by late events
and in contrast to the `main firing` 
    +which is the first firing of the window. In case of session windows, late firings can
further lead to merging of windows,
    +as they may "bridge" the gap between two pre-existing, unmerged windows.
    +
    +<span class="label label-info">Attention</span> You should be aware that
the elements emitted by a late firing should be treated as updated results of a previous computation,
i.e., your data stream will contain multiple results for the same computation. Depending on
your application, you need to take these duplicated results into account or deduplicate them.
    +
    +## Useful state size considerations
    +
    +Windows can be defined over long periods of time (such as days, weeks, or months) and
therefore accumulate very large state. There are a couple of rules to keep in mind when estimating
the storage requirements of your windowing computation:
    + 
    +1. Flink creates one copy of each element per window to which it belongs. Given this,
tumbling windows keep one copy of each element (an element belongs to exactly window unless
it is dropped late). In contrast, sliding windows create several of each element, as explained
in the [Window Assigners](#window-assigners) section. Hence, a sliding window of size 1 day
and slide 1 second might not be a good idea.
    +
    +2. `FoldFunction` and `ReduceFunction` can significantly reduce the storage requirements,
as they eagerly aggregate elements and store only one value per window. In contrast a `WindowFunction`
must accumulate all elements.
    --- End diff --
    
    In contrast, just using a `WindowFunction` requires accumulating all elements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message