spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (Jira)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-30666) Reliable single-stage accumulators
Date Fri, 01 May 2020 16:50:02 GMT

     [ https://issues.apache.org/jira/browse/SPARK-30666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-30666:
------------------------------------

    Assignee:     (was: Apache Spark)

> Reliable single-stage accumulators
> ----------------------------------
>
>                 Key: SPARK-30666
>                 URL: https://issues.apache.org/jira/browse/SPARK-30666
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Enrico Minack
>            Priority: Major
>
> This proposes a pragmatic improvement to allow for reliable single-stage accumulators.
Under the assumption that a given stage / partition / rdd produces identical results, non-deterministic
code produces identical accumulator increments on success. Rerunning partitions for any reason
should always produce the same increments per partition on success.
> With this pragmatic approach, increments from individual partitions / tasks are only
merged into the accumulator on driver side for the first time per partition. This is useful
for accumulators registered with {{countFailedValues == false}}. Hence, the accumulator aggregates
all successful partitions only once.
> The implementations require extra memory that scales with the number of partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message