mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Klues <klue...@gmail.com>
Subject Re: Review Request 48373: Integrated the `NvidiaGpuAllocator` into the `NvidiaGpuIsolator`.
Date Tue, 28 Jun 2016 17:08:31 GMT


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, lines 442-446
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line442>
> >
> >     Unfortunately this won't accomplish your intent of providing context in the
failure message because the .onFailed method does not provide composition, your returned Failure
here will fall into the void.
> >     
> >     To accomplish this, we need a .then which can accept a Future<T> in addition
to being able to take a T.

Yes, I see this now. The .then is just chained onto the .onFailed. In fact, the way I had
it before, I think we never would have actually cleaned up anything (the .onFailed() was the
only thing chained to the original Future).


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 375
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line375>
> >
> >     Hm.. seems we should do another lookup of the 'info' since we are returning
to the isolator context and things may have occurred in the interim. Perhaps it's worth pulling
out an `_update` continuation to force this (no captures possible).

I see you committed this with:
```
      .then(defer(PID<NvidiaGpuIsolatorProcess>(this),
                  &NvidiaGpuIsolatorProcess::_update,
                  containerId,
                  lambda::_1));
```

Why `PID<NvidiaGpuIsolatorProcess>(this)` instead of `self()`?

Also, I see why it makes sense to re-lookup `info`, but I'm unclear on why the continuation
had to be pulled out into a separate function.  The logic feels very disjoint now when adding
GPUs vs. taking them away.  Maybe it's just the name `_update` since it only actually gets
triggered in the "adding" GPUs case.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 447
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line447>
> >
> >     This needs to defer to self since the infos map is managed by the isolator actor.
In general, capturing and using 'this' within a continuation without defering to self is dangerous.

Yes. This was an oversight. When I started thsi patch I was unclear on the need for `defer()`
inside the `.then()`, and looks like I missed one when doing a pass over the code to add it.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, line 261
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line261>
> >
> >     Hm.. wonder why we didn't just use foreach here?

I believe the original code needed a normal `for` look for some reason and the new code just
inherited this form.  I agree -- a `foreach` is more appropriate here now though.


> On June 28, 2016, 2:55 a.m., Benjamin Mahler wrote:
> > src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp, lines 448-451
> > <https://reviews.apache.org/r/48373/diff/4/?file=1428688#file1428688line448>
> >
> >     To guard against races, how about:
> >     
> >     ```
> >           CHECK(infos.contains(containerId));
> >           delete infos.at(containerId);
> >           infos.erase(containerId);
> >     
> >           return Nothing();
> >     ```

Makes sense.  It's the same reasoning as for `update()` with needing to re-lookup `info` (only
this time we don't need to store a temporary to it since we are just deleting it).


- Kevin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/48373/#review139714
-----------------------------------------------------------


On June 23, 2016, 4:34 p.m., Kevin Klues wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/48373/
> -----------------------------------------------------------
> 
> (Updated June 23, 2016, 4:34 p.m.)
> 
> 
> Review request for mesos and Benjamin Mahler.
> 
> 
> Bugs: MESOS-5559
>     https://issues.apache.org/jira/browse/MESOS-5559
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> All logic to do GPU allocation is now conducted asynchronously through
> the `NvidiaGpuAllocator` component.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/gpu/nvidia.hpp 28ad0b3d1d8e7642a93e4ebb0fcc399e182be393

>   src/slave/containerizer/mesos/isolators/gpu/nvidia.cpp cfb23244d0e6f6e0d94685a35826b35f3297c6fe

> 
> Diff: https://reviews.apache.org/r/48373/diff/
> 
> 
> Testing
> -------
> 
> GTEST_FILTER="" make -j check && sudo GTEST_FILTER="*NVIDIA*" src/mesos-tests
> 
> 
> Thanks,
> 
> Kevin Klues
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message