hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@apache.org>
Subject Re: Is there any alternative solution thinking on the event model of YARN
Date Fri, 21 Feb 2014 03:47:17 GMT
I actually think that the component boundaries are much more cleaner now in YARN. Components
(mostly) only interact via events and not via synchronous method calls which Ravi hinted to.
Each event is decorated with its source and destination. This is arguably only using code
comments, but if you think it helps, you can pursue https://issues.apache.org/jira/browse/YARN-1743.

The implementation in YARN is in fact loosely modeled around actors. It's a custom implementation,
we didn't go the full route as we didn't need to.

Like Ravi said, it takes a little getting used to. I have seen developers beyond the initial
set taking a little while getting used to but then doing lots of things much easily after
they get a grip on it; specifically compared to my experience with devs working aroun Hadoop
1.x code, where we didn't have cleaner component boundaries.

Let us know if things like YARN-1743 will help. We can do more. Definitely look for the state
machines as Ravi mentioned, that can simplify your understanding of things a lot.


On Feb 20, 2014, at 5:54 PM, Jeff Zhang <jezhang@gopivotal.com> wrote:

> Hi Ravi,
> Thanks for your reply.  The reason I think another alternative solution of
> event model is that I found that the actor model which is used by spark is
> much easier to read and understand.
> Here I will compare 2 differences on usage of these 2 framework ( I will
> ignore the performance comparison currently)
> 1.  actor explicitly specify the event destination (event handler) when
> sending message, while it is not clear to know the event handler for yarn
> event model
>     e.g
>     actor:
>         actorRef ! message           // it is easy to understand that
> actorRef is the event destination (event handler)
>     yarn:
>         dispatcher.dispatch(message)             //         it's not clear
> who is the event handler, we must to look for the event registration code
> which is in other places.
> 2. actor has the event source builtin, so it is easy to send the message
> back. There's lots of state machines in yarn, and these state machines
> often send message between each other.   e.g,  ContainerImpl interact with
> ApplicationImpl by sending message.
>    e.g.
>    actor:
>        sender ! message   // sender is message sender actor reference
> which is builtin in actor, so it is easy to send message back
>    yarn:
>        dispatcher.dispatch(event)  // yarn event model do not know the
> event source, even he know the source, he still need to rely on the
> dispatcher to send message.  It is not easy for user to know the event flow
> from this piece of code.
>        You still need to look for the event registration code to get know
> the event handler.
> Let me know if you have any thinking.  Thanks
> Jeff Zhang
> On Fri, Feb 21, 2014 at 4:02 AM, Ravi Prakash <ravihoo@ymail.com> wrote:
>> Hi Jeff!
>> The event model does have some issues, but I believe it has made things a
>> lot simpler. The source could easily be added to the event object if you
>> needed it to. There might be issues with flow control, but I thought they
>> were fixed where they were cropping up.
>> MRv1 had all these method calls which could affect the state in several
>> ways, and synchronization and locking was extremely difficult to get right
>> (perhaps only by the select few who completely understood the codebase).
>> The event model is so much simpler and mvn -Pvisualize draws out a
>> beautiful state diagram. It takes a little getting used to, but you can
>> connect the debugger and trace through the code too with conditional
>> breakpoints. This is of course just my opinion.
>> Ravi
>>  On Wednesday, February 19, 2014 6:33 PM, Jeff Zhang <
>> jezhang@gopivotal.com> wrote:
>> Hi all,
>> I have studied YARN for several months, and have some thinking on the event
>> model of YARN.
>> 1.  The event model do help the performance of YARN by allowing async call
>> 2.  But the event model make the boundary of each component unclear. The
>> event receiver do not know the sender of this event which make the reader
>> difficult to understand the event flow.
>>      E.g. in node manager,  there's several event sender and handler which
>> include container , application, localization server, log aggregation
>> service and so on.  One component will send event to another component.
>> Because of the lack of the event sender in receiver, it is not easy to read
>> the code and understand the event flow.
>>      The event flow in resource manager is even more complex which involve
>> the RMApp, RMAppAttempt, RMContainer, RMNode, Scheduler
>> 3.  INHO, the complexity of the event model make new contributor hard to
>> understand the code base, and hard to maintain the codebase in future. One
>> small change in the state machine may affect the other component and
>> difficult to find the cause.
>> Just wondering is there already some thinking on the event mode of YARN.
>> And correct me if my understanding if wrong.
>> Thanks
>> Jeff Zhang

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message