flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates
Date Wed, 28 Jun 2017 16:07:02 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066781#comment-16066781

ASF GitHub Bot commented on FLINK-6969:

Github user sunjincheng121 commented on the issue:

    @fhueske That's correct. we generate watermark by timestamp subtracting the offset. (that's
SLA mechanism). we are on the same boat.
    I think the value of firstResultTimeOffset is represent the maximum delay of the data
source. e.g., the TT source guarantee that the maximum delay of data is no more than 5 seconds.
Then we can set firstResultTimeOffset=5 seconds.
    So, the name of firstResultTimeOffset not pretty suitable. and i think this config item
can not be used in early feature. for example the maximum data delay of TT source is 5seconds,
so we should set firstResultTimeOffset=5, but we also want get early result for a window (of
course we must deal with retract/updated result).
    At current time I also suggest separate the deferred and early configuration items. deferred
is a configuration item of data source data delay SLA. and early is a configuration item of
emit window result. How about slaOfdataDelay? I am not good at naming the config item. But
want try to clearly express the meaning of this configuration item.
    To be honest I'm not sure about early result feature whether we are on the same boat or
not. If not, I think we need detail communication. :-)

> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
> Deferred computation is a strategy to deal with late arriving data and avoid updates
of previous results. Instead of computing a result as soon as it is possible (i.e., when a
corresponding watermark was received), deferred computation adds a configurable amount of
slack time in which late data is accepted before the result is compute. For example, instead
of computing a tumbling window of 1 hour at each full hour, we can add a deferred computation
interval of 15 minute to compute the result quarter past each full hour.
> This approach adds latency but can reduce the number of update esp. in use cases where
the user cannot influence the generation of watermarks. It is also useful if the data is emitted
to a system that cannot update result (files or Kafka). The deferred computation interval
should be configured via the {{QueryConfig}}.

This message was sent by Atlassian JIRA

View raw message