mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 48914: Added GPU_RESOURCES capability to FrameworkInfo.
Date Tue, 21 Jun 2016 01:28:48 GMT


> On June 19, 2016, 2:18 a.m., Guangya Liu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, line 1274
> > <https://reviews.apache.org/r/48914/diff/1/?file=1423480#file1423480line1274>
> >
> >     s/slaves/agents
> 
> Kevin Klues wrote:
>     I thought about this as I was writing it, but I left it as slave because all of the
other comments in the file use slave. I figured it was better to stay consistent and wait
for a full pass to change it than have one coment refering to agent and the next referring
to slave.
> 
> Guangya Liu wrote:
>     I think that we should use `agent` but not `slave` for the new code as we need finally
update all `slave` to `agent`, you can take a look at https://github.com/apache/mesos/blob/master/src/tests/hierarchical_allocator_tests.cpp
, all of the latest code are using `agent` but not `slave`
> 
> Kevin Klues wrote:
>     I'm still not convinced. In the case of adding a new test, I agree -- everything
should be worded as `agent`. This is because you typically consume each test individually,
and so long as the terminology within the test is self-consistent, everything looks good.
>     
>     I just think its weird -- within the same function -- to have one comment say `slave`,
the next say `agent`, and then another say `slave`, all within 10 lines of each other.
> 
> Guangya Liu wrote:
>     You may see that all logs and outputs in this file are using `agent` but not `slave`,
it was fixed here https://reviews.apache.org/r/45213/diff/3#8 , I think that we should not
use `slave` from now on but keep `agent` for all new introduced code.

Thanks for noticing this Guangya! I'm fine with the usage of "slave" in these comments since
the adjacent comments and data structures are still using slave, so Kevin is trying to keep
things consistent. It seems feasible to do a sweep renaming in the allocation code, would
you like to do this?


> On June 19, 2016, 2:18 a.m., Guangya Liu wrote:
> > src/master/allocator/mesos/hierarchical.cpp, lines 1277-1279
> > <https://reviews.apache.org/r/48914/diff/1/?file=1423480#file1423480line1277>
> >
> >     This logic seems also allow agents without gpu resources offered to the framework
with gpuCapble, is it expected behavior?
> 
> Kevin Klues wrote:
>     Yes, that was my intended behaviour as I wrote it. If we think there is some problem
with this, we should discuss it further.
> 
> Guangya Liu wrote:
>     So what is the use of the agent without GPU for the framework which request gpu resources?
> 
> Klaus Ma wrote:
>     As far as I known, some GPU application can also run on CPU.
> 
> Kevin Klues wrote:
>     Not all tasks launched by a GPU capable framework will necessarily require GPU resources.
Some will, some wont -- but at least this capability gives these frameworks a chance to be
offered GPUs before non-GPU-capable frameworks starve them out.  In other words, this capability
is more about making sure that frameworks that definitely *never* need GPU resources, don't
fill up GPU capable machines with non-GPU workloads. It shouldn't restrict frameworks that
have this capability from receiving their fair share of resources in the cluster.
> 
> Guangya Liu wrote:
>     Got it, then I think that here should have some comments to clarify the behavior
as discussed above to make sure everyone on the same page.

I suggested documentation for the capability that should capture the intent of this capability.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48914/#review138432
-----------------------------------------------------------


On June 18, 2016, 10:07 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48914/
> -----------------------------------------------------------
> 
> (Updated June 18, 2016, 10:07 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-5634
>     https://issues.apache.org/jira/browse/MESOS-5634
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Due to the scarce resource problem described in MESOS-5377, we are
> introducing a GPU_RESOURCES Framework capability. This capability
> allows the Mesos allocator to make better decisions about which
> frameworks should receive resources from GPU capable machines. In
> essence, the allocator ONLY allocate resources from GPU capable
> machines to frameworks that have this capability. This is necessary to
> prevent non-GPU workloads from filling up the GPU machines and
> preventing GPU workloads to run.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto e4c5bd31cf035707036eb509336fe051119b4e78 
>   include/mesos/v1/mesos.proto 9be22f02861f1eb89ab547d88530faf90ebee7ab 
>   src/master/allocator/mesos/hierarchical.hpp 9c6b23abe2b0cb16412f1ed90165f8d0c14552fa

>   src/master/allocator/mesos/hierarchical.cpp 8b7b3afb5770c617918ec4864faaf8d8a7a81ef2

>   src/tests/containerizer/nvidia_gpu_isolator_tests.cpp e06d107f2dcdb9b470e330c8ceee66a54220d41b

> 
> Diff: https://reviews.apache.org/r/48914/diff/
> 
> 
> Testing
> -------
> 
> $ make -j check; sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message