spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jelmer <jkupe...@gmail.com>
Subject Missing accumulator data when a task is speculated and the original task fails with TaskCommitDenied
Date Thu, 12 Aug 2021 08:34:02 GMT
Hi,

I am using spark 2.4.0.cloudera2 and I have a job that reads a small number
of files that result in an rdd with 5 partitions

I also have an accumulator that I update at the end of a map partition call
(when the iterator

What I've observed is that if a task is speculated and the original task
fails with TaskCommitDenied then the counts collected in the accumulator
for that partition are somehow lost

I've been reading articles outlining how data could be sent twice in case
of speculative tasks but I haven't read anything about accumulators losing
data.

Does anyone have any idea what could be the reason for this ?

Mime
View raw message