flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates
Date Wed, 28 Jun 2017 14:48:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066612#comment-16066612
] 

ASF GitHub Bot commented on FLINK-6969:
---------------------------------------

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4183
  
    Hi @sunjincheng121, that's very good input for the discussion!
    
    1. I would like to avoid to unnecessarily change the API and to deprecate methods and
parameters. IMO, `firstResultTimeOffset` is a better name because it describes that there
is an offset in time to compute the first result. Also, configuring early firing and deferred
computation would be possible (from an API point of view) if we have a dedicated parameter
for each feature. So we would need to check if both parameters are set and throw an exception.
If we only have a single parameter `firstResultTimeOffset` there can be no invalid configuration
because you either compute an early or deferred result.
    
    2.1. Yes, it is possible to emit records with correct (i.e., window end) timestamps and
this is what we should do.
    
    2.2. I think we cannot rely on the fact that other operators support deferred computation.
First, this would force us to implement this for all time-based operators (like over windows
and later joins). Second, if we convert a table with a deferred group window aggregation back
into a DataStream, all records of the stream would be late. I just had a discussion with @aljoscha
about this issue. Holding back watermarks is not really possible in Flink and adjusting watermarks
after a window operator incrementally adds latency. 
    
    @sunjincheng121 I think you are right about the adjustment of the watermarks. The only
approach that doesn't add more latency than the offset is to subtract the offset from the
watermarks at all sources. We would not need to add any custom triggers and all operators
(including future ones) would immediately support deferred computation.
    
    What do you think @sunjincheng121 and @wuchong?


> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid updates
of previous results. Instead of computing a result as soon as it is possible (i.e., when a
corresponding watermark was received), deferred computation adds a configurable amount of
slack time in which late data is accepted before the result is compute. For example, instead
of computing a tumbling window of 1 hour at each full hour, we can add a deferred computation
interval of 15 minute to compute the result quarter past each full hour.
> This approach adds latency but can reduce the number of update esp. in use cases where
the user cannot influence the generation of watermarks. It is also useful if the data is emitted
to a system that cannot update result (files or Kafka). The deferred computation interval
should be configured via the {{QueryConfig}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message