hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject multiple requests for
Date Tue, 08 Jan 2013 01:39:43 GMT
I've come across an NPE in AppSchedulingInfo so I looked around to try to
determine the cause, and I think came across a problem with how containers
are scheduled.  It seems like somebody should have run into this already,
so I wanted to ask about it before I filed a JIRA.  Am I just
misunderstanding how things work?

When requesting a node-local container, YARN schedulers expect three
ResourceRequests - one at the node-level, one at the rack level, and one at
the "*" level.  For each application and priority, these requests are
stored by the RM as a map of location strings to ResourceRequests.
 Schedulers try to schedule requests node-locally, but do rack-local, and
then off-switch, after a given number of heartbeats pass.  When a
node-local container is allocated, the number of outstanding containers is
decremented at each level.  When a rack-local container is allocated, only
the number of outstanding rack local and "*" requests are decremented.

This means that if a rack-local container is allocated, the node-local
container will still be around, and when the scheduler tries to allocate
it, the scheduler should run into an NPE, as there will be no rack-local
ResourceRequest to decrement.
What would be the best way to deal with this?  It seems like node-local
ResourceRequests need to be tied to rack-local ResourceRequests, so that
node-local requests can be removed when their corresponding rack-local
requests are, but the current AllocateRequest is a list of independent
resource requests.

thanks for any guidance,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message