spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject spark lacks fault tolerance with dynamic partition overwrite
Date Fri, 03 Apr 2020 03:06:08 GMT
i wanted to highlight here the issue we are facing with dynamic partition
overwrite.

it seems that any tasks that writes to disk using this feature and that
need to be retried fails upon retry, leading to a failure for the entire
job.

we have seen this issue show up with preemption (task gets killed by
pre-emption, and when it gets rescheduled it fails consistently). it can
also show up if a hardware issue causes your task to fail, or if you have
speculative execution enabled.

relevant jiras are SPARK-30320 and SPARK-29302

this affects spark 2.4.x and spark 3.0.0-SNAPSHOT
writing to hive does not seem to be impacted.

best,
koert

Mime
View raw message