spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: Handling fatal errors of executors and decommission datanodes
Date Mon, 16 Mar 2015 09:40:14 GMT
I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353


On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang <jianshi.huang@gmail.com>
wrote:

> Hi,
>
> We're facing "No space left on device" errors lately from time to time.
> The job will fail after retries. Obvious in such case, retry won't be
> helpful.
>
> Sure it's the problem in the datanodes but I'm wondering if Spark Driver
> can handle it and decommission the problematic datanode before retrying it.
> And maybe dynamically allocate another datanode if dynamic allocation is
> enabled.
>
> I think there needs to be a class of fatal errors that can't be recovered
> with retries. And it's best Spark can handle it nicely.
>
> Thanks,
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
View raw message