tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: Handling ATS downtime
Date Thu, 22 Jan 2015 00:19:38 GMT
Agreed. @Jonathan, can you file jiras for the 2 cases with related stack traces? 

The domain handling might be the trickier issue as we will need to disable ATS publishing
in case the domain could not be created.

— Hitesh 

On Jan 21, 2015, at 4:03 PM, Jonathan Eagles <jeagles@gmail.com> wrote:

> I just checked this behavior in a secure cluster and if it fails to get a
> timeline server delegation token or fails to post the domain,  the job will
> fail. We should consider making these operations "best effort" as well.
> On Jan 21, 2015 5:33 PM, "Hitesh Shah" <hitesh@apache.org> wrote:
>> Actually at this time, the current impl just logs a WARN when there is a
>> failure pushing data to ATS. ATS is not treated as a critical entity as it
>> is not needed for job recovery.
>> — Hitesh
>> On Jan 21, 2015, at 3:01 PM, Rohini Palaniswamy <rohini.aditya@gmail.com>
>> wrote:
>>> Folks,
>>>    In the middle of big discussion on how to get delegation tokens from
>>> ATS for Oozie jobs, another question came up. What is the behaviour of
>>> running tez jobs if ATS goes down. Haven't tried it out, but my guess is
>>> the job is going to fail. Or do we do something now to handle the failure
>>> and still have the job complete successfully?
>>> Regards,
>>> Rohini

View raw message