hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: scheduler satisfying heterogeneous resource requests at same priority
Date Wed, 02 Jan 2013 20:52:12 GMT
Mappers and reducers are requested at different priorities.  Reducers have
a higher priority. But the AM does not request all of the reducers at
once. It waits and will request some at a time until all of the mappers
have been satisfied at which point it then requests the rest of the
reducers.

--Bobby

On 1/2/13 2:47 PM, "Sandy Ryza" <sandy.ryza@cloudera.com> wrote:

>Thanks for looking into it Bikas.  What you wrote makes sense to me.
>You're
>right that it's the last request not the largest.  Otherwise, you
>summarize
>my confusion well - why doesn't AppSchedulingInfo hold a list of
>ResourceRequests for each node/priority?
>
>I also don't understand why this hasn't caused a problem already for
>mapreduce when mappers and reducers request different amounts of memory.
> It must be either because reduces are requested after all map containers
>are completed? Or because they're requested at non-overlapping locations?
>
>On Wed, Jan 2, 2013 at 11:04 AM, Bikas Saha <bikas@hortonworks.com> wrote:
>
>> Reading the code seems to suggest that AppSchedulingInfo is not
>>preferring
>> the larger request. Its simply returning the last request for that
>> priority and hostname. So it could be that in your case, the larger
>> request is the second request. You could try and make it the first
>>request
>> and check if you get the same results.
>>
>> Wrt, your ResourceRequest question, having a single Resource capability
>> simplifies ResourceRequest operations. Having heterogeneous resources is
>> allowed by the API by submitting multiple ResourceRequests having
>> different Resource capabilities. See the RMContainerRequestor code in
>>the
>> MR YARN app. Given the above, it looks like the Resource heterogeneity
>>is
>> lost inside the AppSchedulingInfo and that may be a bug or a conscious
>> decision. Looking at folks experienced in that code for an answer. How
>>is
>> everything working despite this? Perhaps because the applications are
>>not
>> issuing heterogeneous requests for a given priority and location.
>> Secondly, the * catch all is always around to save the day.
>>
>> Let me know if this makes sense. I may have missed stuff.
>>
>> -----Original Message-----
>> From: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
>> Sent: Friday, December 28, 2012 4:46 PM
>> To: yarn-dev@hadoop.apache.org
>> Subject: scheduler satisfying heterogeneous resource requests at same
>> priority
>>
>> I am trying to understand how YARN schedulers are able to satisfy
>>smaller
>> requests while larger requests are outstanding (per YARN-289).
>>
>> Consider the following situation:
>> An application submits two requests - one for a container with 1024 MB
>>and
>> one for a container with 2048 MB.  1024 MB frees up on a node.  The
>> scheduler should (or might wish to) place the smaller container on the
>> node, instead of placing a reservation for the larger one.
>>
>> However, currently, if I understand correctly, the larger request is
>> always serviced first.  AppSchedulingInfo, which is used by all the
>> schedulers to find a container request when space becomes available,
>> stores a map of priorities to maps of node/rack/* to ResourceRequests.
>>A
>> ResourceRequest contains a single Resource (capability), and the number
>>of
>> containers.  Why does a ResourceRequest not allow for heterogeneous
>> containers.  Is this just not supported yet because it hasn't been
>>needed
>> yet?  Or is there a more fundamental reason I'm missing about why it
>> doesn't make sense?
>>
>> many thanks for any guidance,
>> Sandy
>>


Mime
View raw message