spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (Jira)" <>
Subject [jira] [Updated] (SPARK-26655) Support multiple aggregates in Structured Streaming append mode
Date Mon, 16 Mar 2020 22:54:07 GMT


Dongjoon Hyun updated SPARK-26655:
    Affects Version/s:     (was: 3.0.0)

> Support multiple aggregates in Structured Streaming append mode
> ---------------------------------------------------------------
>                 Key: SPARK-26655
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.1.0
>            Reporter: Arun Mahadevan
>            Priority: Major
>         Attachments: Watermarks and multiple aggregates in Spark strucutred streaming_v1.pdf
> Right now multiple aggregates are not supported in structured streaming.
> However, in append mode, the aggregates are emitted only after the watermark passes the
threshold (e.g. the window boundary) and the emitted value is not affected by further late
data. So it possible to chain multiple aggregates in 'Append' output mode without worrying
about retractions.
> However the current event time watermarks in structured streaming are tracked at a global
level and this does not work when aggregates are chained.
> We need to track the watermarks at individual operator level so that each operator can
make progress independently and not rely on global min or max value.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message