Hello John,1. If a task complete the operation, it will notify driver. The driver may not receive the message due to the network, and think the task is still running. Then the child stage won't be scheduled ?
Spark's fault tolerance policy is, if there is a problem in processing a task or an executor is lost, run the task (and any dependent tasks) again. Spark attempts to minimize the number of tasks it has to recompute, so usually only a small part of the data is recomputed.
So in your case, the driver simply schedules the task on another executor and continues to the next stage when it receives the data.
2. how do spark guarantee the downstream-task can receive the shuffle-data completely. As fact, I can't find the checksum for blocks in spark. For example, the upstream-task may shuffle 100Mb data, but the downstream-task may receive 99Mb data due to network. Can spark verify the data is received completely based size ?
Spark uses compression with checksuming for shuffle data so it should know when the data is corrupt and initiate a recomputation.
As for your question in the subject:
All of this means that Spark supports at-least-once processing. There is no way that I know of to ensure exactly-once. You can try to minimize more-than-once situations by updating your offsets as soon as possible but that does not eliminate the problem entirely.
Hope this helps,