spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (Jira)" <>
Subject [jira] [Assigned] (SPARK-30666) Reliable single-stage accumulators
Date Fri, 01 May 2020 16:50:02 GMT


Apache Spark reassigned SPARK-30666:

    Assignee:     (was: Apache Spark)

> Reliable single-stage accumulators
> ----------------------------------
>                 Key: SPARK-30666
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Enrico Minack
>            Priority: Major
> This proposes a pragmatic improvement to allow for reliable single-stage accumulators.
Under the assumption that a given stage / partition / rdd produces identical results, non-deterministic
code produces identical accumulator increments on success. Rerunning partitions for any reason
should always produce the same increments per partition on success.
> With this pragmatic approach, increments from individual partitions / tasks are only
merged into the accumulator on driver side for the first time per partition. This is useful
for accumulators registered with {{countFailedValues == false}}. Hence, the accumulator aggregates
all successful partitions only once.
> The implementations require extra memory that scales with the number of partitions.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message