hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bikas Saha <bi...@hortonworks.com>
Subject RE: scheduler satisfying heterogeneous resource requests at same priority
Date Wed, 02 Jan 2013 19:04:54 GMT
Reading the code seems to suggest that AppSchedulingInfo is not preferring
the larger request. Its simply returning the last request for that
priority and hostname. So it could be that in your case, the larger
request is the second request. You could try and make it the first request
and check if you get the same results.

Wrt, your ResourceRequest question, having a single Resource capability
simplifies ResourceRequest operations. Having heterogeneous resources is
allowed by the API by submitting multiple ResourceRequests having
different Resource capabilities. See the RMContainerRequestor code in the
MR YARN app. Given the above, it looks like the Resource heterogeneity is
lost inside the AppSchedulingInfo and that may be a bug or a conscious
decision. Looking at folks experienced in that code for an answer. How is
everything working despite this? Perhaps because the applications are not
issuing heterogeneous requests for a given priority and location.
Secondly, the * catch all is always around to save the day.

Let me know if this makes sense. I may have missed stuff.

-----Original Message-----
From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
Sent: Friday, December 28, 2012 4:46 PM
To: yarn-dev@hadoop.apache.org
Subject: scheduler satisfying heterogeneous resource requests at same

I am trying to understand how YARN schedulers are able to satisfy smaller
requests while larger requests are outstanding (per YARN-289).

Consider the following situation:
An application submits two requests - one for a container with 1024 MB and
one for a container with 2048 MB.  1024 MB frees up on a node.  The
scheduler should (or might wish to) place the smaller container on the
node, instead of placing a reservation for the larger one.

However, currently, if I understand correctly, the larger request is
always serviced first.  AppSchedulingInfo, which is used by all the
schedulers to find a container request when space becomes available,
stores a map of priorities to maps of node/rack/* to ResourceRequests.  A
ResourceRequest contains a single Resource (capability), and the number of
containers.  Why does a ResourceRequest not allow for heterogeneous
containers.  Is this just not supported yet because it hasn't been needed
yet?  Or is there a more fundamental reason I'm missing about why it
doesn't make sense?

many thanks for any guidance,

View raw message