hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Kumara Subramanian <codin.mart...@gmail.com>
Subject Re: Application Master High Availability
Date Mon, 24 Aug 2020 12:16:40 GMT
On a similar note, I have another question for the community.

Currently, when an AM dies/is killed for maintenance, YARN spins up the new
AM and the control on which hosts the new AM lands is very limited.
Applications can control where the AM starts up during job submission;
however, the subsequent restarts in the event of restarts is static/limited
to the characteristics of the initial request.

   1. Is there a way to control where YARN spins up the new AM in case of
   failures? i.e. in a different fault domain
   2. If there isn't a dynamic option, I was thinking of have some sort of
   labels that spreads across fault domains and as part of maintenance when an
   AM is killed, we also terminate the NM and the label constraints will force
   YARN to place it on another fault domain. Will this approach work?


Thanks,
Bharath

On Fri, Aug 14, 2020 at 10:11 PM Bharath Kumara Subramanian <
codin.martial@gmail.com> wrote:

> Thanks for the clarification!
> I was able to test it out with Samza as well and get it working. I look
> forward to this patch https://issues.apache.org/jira/browse/YARN-4758 as
> it would simplify things a lot.
>
> Appreciate your help.
>
> On Wed, Aug 12, 2020 at 6:51 AM epayne@apache.org <epayne@apache.org>
> wrote:
>
>> Bharath,
>> I just want to clarify a couple of things. The yarn distributed shell is
>> its own framework, and it does support preserving containers across AM
>> restart. I have tested this. But the MapReduce framework does not support
>> this feature (see https://issues.apache.org/jira/browse/MAPREDUCE-6608).
>> I spoke with Jon Eagles, and he believes that the Tez framework does
>> support the container-preserving feature.
>>
>> On Tuesday, August 11, 2020, 5:53:32 PM CDT, Bharath Kumara Subramanian <
>> codin.martial@gmail.com> wrote:
>>
>> Thanks Eric & Wilfred.
>> To give you some context, I work on Apache Samza and we have streaming as
>> a
>> service offering on top of YARN.
>> Ideally, we would like to ensure in the event of AM restarts, the
>> container
>> it spawned can still be taken over by the new AM.
>>
>> "Keep containers across application attempts" seems like an option that
>> might work for us.
>> Let me investigate and play with the parameter.
>>
>> Appreciate your quick response.
>>
>> Cheers,
>> Bharath
>>
>>
>> On Tue, Aug 11, 2020 at 6:31 AM Eric Payne <eric.payne1000@yahoo.com>
>> wrote:
>>
>> > Bharath,
>> >
>> > while there is no concept of HA AM in YARN, some frameworks do support
>> > preserving containers across AM restarts.
>> > In the yarn distributed shell, for example, you can set the
>> > "-keep_containers_across_application_attempts"
>> > parameter.
>> >
>> > -Eric
>> >
>> >
>> > On Monday, August 10, 2020, 7:18:18 PM CDT, Bharath Kumara Subramanian <
>> > codin.martial@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> > I am looking for more documentation/information on AM high
>> availability. I
>> > looked through the documentation and found resources on RM high
>> > availability but none for AM.
>> >
>> > I understand, YARN has provisions to restart AM in case of failure up
>> to a
>> > configured number of attempts. However, I wanted to know if YARN has an
>> > active/standby option for AM. I would like to avoid bringing up the
>> > processing containers again in the event of AM failure and have the
>> standby
>> > AM take over the managing my application.
>> >
>> > Thanks,
>> > Bharath
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message