spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Wendell <pwend...@gmail.com>
Subject Re: Spark does not retry failed tasks initiated by hadoop
Date Thu, 23 Jan 2014 00:44:52 GMT
What makes you think it isn't retrying the task? By default it tries
three times... it only prints the error once though. in this case if
your cluster doesn't have any datanodes it's likely that it failed
several times.

On Wed, Jan 22, 2014 at 4:04 PM, Aureliano Buendia <buendia360@gmail.com> wrote:
> Hi,
>
> I've written about this issue before, but there was no reply.
>
> It seems when a task fails due to hadoop io errors, spark does not retry
> that task, and only reports it as a failed task, carrying on the other
> tasks. As an example:
>
> WARN ClusterTaskSetManager: Loss was due to java.io.IOException
> java.io.IOException: All datanodes x.x.x.x:50010 are bad. Aborting...
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3096)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2589)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2793)
>
>
> I think almost all spark applications need to have 0 failed task in order to
> produce a meaningful result.
>
> These io errors are not usually repeatable, and they might not occur after a
> retry. Is there a setting in spark enforce a retry upon such failed tasks?

Mime
View raw message