flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates
Date Tue, 27 Jun 2017 16:21:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065056#comment-16065056

ASF GitHub Bot commented on FLINK-6969:

Github user fhueske commented on a diff in the pull request:

    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/api/queryConfig.scala
    @@ -92,6 +100,27 @@ class StreamQueryConfig private[table] extends QueryConfig {
    +  /**
    +    * Specifies a deferred computation time for deferred computation, i.e., fires the
    --- End diff --
    Specifies an offset for the point in time when the first result of a time-based computation
is computed. For example, a tumbling window of one hour that ends at 13:00 would usually compute
its first result at 13:00. With a firstResultTimeOffset of 15 minutes, the first result would
be computed at 13:15.
    A positive firstResultTimeOffset parameter can be used to include late arriving records
into the result of an event-time based computation. Negative offset values are not supported
    Later, a negative offset will allow to compute early results, i.e., an offset of -45 minutes
would compute the first and early result of the hourly tumbling window that ends at 13:00
at 12:15.

> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
> Deferred computation is a strategy to deal with late arriving data and avoid updates
of previous results. Instead of computing a result as soon as it is possible (i.e., when a
corresponding watermark was received), deferred computation adds a configurable amount of
slack time in which late data is accepted before the result is compute. For example, instead
of computing a tumbling window of 1 hour at each full hour, we can add a deferred computation
interval of 15 minute to compute the result quarter past each full hour.
> This approach adds latency but can reduce the number of update esp. in use cases where
the user cannot influence the generation of watermarks. It is also useful if the data is emitted
to a system that cannot update result (files or Kafka). The deferred computation interval
should be configured via the {{QueryConfig}}.

This message was sent by Atlassian JIRA

View raw message