mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Mahler <bmah...@apache.org>
Subject Re: Review Request 69775: Updated master fail() logging from FATAL to ERROR.
Date Thu, 17 Jan 2019 00:43:48 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69775/#review212092
-----------------------------------------------------------



Thanks for doing this, this will avoid a lot of confusion around the master recovery failure
case!

Can you list all the `fail()` cases and make sure that they will output a clear message now
that there's no stack trace?

* failure to recover master (this definitely shouldn't stack trace, will help avoid a lot
of confusion to remove the stack trace)
* failure to mark agent unreachable (it's odd that this particular registry operation is handled
via `fail()` and the others are not)
* failure to acquire agent removal rate limit token (this should never fail and so stack trace
is actually desirable?)

I'm also inclined to not keep `fail()` and either use lambdas or have wrappers for the common
cases. For example, have all registry operations go through the same wrapper:

```
  // Old:
  registrar->apply(Owned<RegistryOperation>(new AdmitSlave(slaveInfo_)))
    .<something> // this part is done inconsistently
    
  // New:
  applyRegistryOperation(Owned<RegistryOperation>(new AdmitSlave(slaveInfo_)))
    .then( ... );
    

Future<bool> applyRegistryOperation(Owned<RegistryOperation>&& operation)
{
  return registrar->apply(std::move(operation))
    .onAbandoned(...) // LOG(FATAL) << ...;
    .onDiscarded(...) // LOG(FATAL) << ...;
    .onFailed(...) // EXIT(EXIT_FAILURE) << ...; ?
}
```

Some cases don't even handle the failures? :O

E.g. https://github.com/apache/mesos/blob/f01853aea4eaa3df6dec3f7342e5583f5addd07d/src/master/master.cpp#L1745-L1746

Ideally, the return type of the registry apply operation would allow us to distinguish between
timeout and other failures. E.g. `Future<variant<TimeoutError, bool>>`

- Benjamin Mahler


On Jan. 16, 2019, 11:07 p.m., Gilbert Song wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69775/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2019, 11:07 p.m.)
> 
> 
> Review request for mesos, Andrei Budnik, Benjamin Mahler, Greg Mann, and Qian Zhang.
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> LOG(FATAL) would dump a stack trace which may confuse people with
> a master crash case. We should just print out an error msg.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 2339207149a85578ea47cf66f28392182f9075f2 
> 
> 
> Diff: https://reviews.apache.org/r/69775/diff/2/
> 
> 
> Testing
> -------
> 
> N/A
> 
> 
> Thanks,
> 
> Gilbert Song
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message