tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: Tez session max attempts of one when recovery is disabled
Date Fri, 11 Sep 2015 17:03:00 GMT
Hello Jon,

If recovery is disabled, there is no clear way to know whether the previous attempt was in
process of doing a commit and was aborted at that point. Given that there is no clear way
to safely re-start/re-process the work, I believe the tez client sets to max attempts to 1
if recovery is disabled. Furthermore, with sessions, DAGs are submitted over RPC and not via
the ApplicationSubmissionContext so therefore there will be no record of the DAG being submitted
if recovery is disabled. The second attempt in this case will launch but will not do anything
unless the client re-submits the DAG.

I think we should look to back porting all relevant recovery fixes to branch 0.7 if you would
like to stabilize on that branch. Are there any known fixes on master that we should backport?

Jeff has been driving a lot of changes for recovery with a lot of fixes being tracked off
https://issues.apache.org/jira/browse/TEZ-2581. It would be good if you could help review
and help test these patches in this regard. I believe Jeff was planning to do a full rebase
after TEZ-2003 got merged in but may not have done that yet. 

thanks
— Hitesh 

On Sep 11, 2015, at 9:38 AM, Jonathan Eagles <jeagles@gmail.com> wrote:

> Running pig on tez (0.7.1 pre-release) with recovery disabled and noticed
> that when the am fails there is no other attempts. What is it about
> sessions versus non-sessions (what bad thing are we preventing) that keeps
> us from retrying when recovery is disabled?
> 
> (background) Pig only runs sessions even when only executing a single DAG
> and recovery is fragile in 0.7.1 where hangs are likely, fixed only in 0.8.
> I want pig on tez to be a stable as pig on mr, where AM failures and going
> to dissuade users from migrating to pig on tez.
> 
> Jon


Mime
View raw message