spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Herman van Hövell (Jira) <j...@apache.org>
Subject [jira] [Updated] (SPARK-30108) Add robust accumulator for observable metrics
Date Mon, 09 Dec 2019 17:13:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-30108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Herman van Hövell updated SPARK-30108:
--------------------------------------
    Description: 
Spark accumulators reflect the work that has been done, and not the data that has been processed.
There are situations where one tuple can be processed multiple times, e.g.: task/stage retries,
speculation, determination of ranges for global ordered, etc... For observed metrics we need
the value of the accumulator to be based on the data and not on processing.

The current aggregating accumulator is already robust to some of these issues (like task
failure), but we need to add some additional checks to make sure it is fool proof.

> Add robust accumulator for observable metrics
> ---------------------------------------------
>
>                 Key: SPARK-30108
>                 URL: https://issues.apache.org/jira/browse/SPARK-30108
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Herman van Hövell
>            Priority: Major
>
> Spark accumulators reflect the work that has been done, and not the data that has been
processed. There are situations where one tuple can be processed multiple times, e.g.: task/stage
retries, speculation, determination of ranges for global ordered, etc... For observed metrics
we need the value of the accumulator to be based on the data and not on processing.
> The current aggregating accumulator is already robust to some of these issues (like
task failure), but we need to add some additional checks to make sure it is fool proof.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message