mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhang...@cn.ibm.com>
Subject Re: Review Request 44269: Added the framework of 'network/cni' isolator.
Date Mon, 07 Mar 2016 15:39:55 GMT


> On March 4, 2016, 11:30 p.m., Avinash sridharan wrote:
> > src/slave/containerizer/mesos/isolators/network/cni.cpp, line 134
> > <https://reviews.apache.org/r/44269/diff/4/?file=1280944#file1280944line134>
> >
> >     I don't understand this comment. We just made sure the plugin does not exist?
So what does the comment imply "it can
> >             // still be valid as long as operator puts the CNI plugin binary
> >             // that it uses under '--network_cni_plugins_dir'." ?
> >             
> >     I think at this point we should return an error. If can't find an executable
for a named network, the behavior will become undefined. We should bail at this point.
> 
> Qian Zhang wrote:
>     My point is, if we can not find a plugin for a named network during initilization,
log a warning message to let operator know this issue, and afterward operator can put the
plugin in the plugin directory without restarting agent, then the named network can still
work.
> 
> Avinash sridharan wrote:
>     Lets not rely on the operator heeding WARNING messages and fixing the problem. My
concern is that this is a `FATAL` error since before the operator can rectify the error if
containers are launched the behavior becomes undefined.
> 
> Qian Zhang wrote:
>     Agree, let's return an error :-)
> 
> Qian Zhang wrote:
>     After more thinking, I think in this case, it makes more sense to log a warning message
and ignore the network config file rather than bail at this point, because there can be other
valid network config files. If in the end there is no any valid network config files, we should
definitely bail.
> 
> Avinash sridharan wrote:
>     I think we should not allow any errors in the configs/plugins passed by the operator
. Reason being that frameworks are going to learn about networks out-of-band, and if there
are config/plugin errors we will have to throw errors during task launch. Why should we allow
the system to proceed knowing that this is going to lead to erroneous situation? The only
way the operator can fix this error is by restarting the slave (and fixing the config), so
might as well bail out sooner rather than later.
> 
> Qian Zhang wrote:
>     Can you please let me know how this can lead to erroneous situation? If a network
config file is invalid for whatever reason, "network/isolator" will NOT load it and just ignore
it, so how can framework launches a task to join an invalid network which is not loaded by
the isolator? I do not think framework user has such knowledge, or you think framework user
will know all the network config files (valid or invalid) under "--network_cni_config_dir"
in some way?
> 
> Avinash sridharan wrote:
>     Frameworks would know only the network name. Its the responsibility of the operator
to install the right config for the given `name`. Hence the erroneous case. The fact that
config was not loaded for valid network name cause inconsistency between the frameworks view
of what is available and the isolators view of what is configurable.
> 
> Qian Zhang wrote:
>     What about framework specifies a wrong network name by mistake? Even in `create()`
method we ensure every network config file is valid and agent is started successfuly, there
is still a chance for framework to specify a wrong network name (which is actually out of
our control), right? So my point is, we have to handle this erroneous in launching task case
anyway.
>     
>     And I do not quite understand what is `the frameworks view of what is available`,
can you please elaborate how framework can know what is available? My thinking is, in future
we may expose the available CNI networks to framework via an HTTP endpoint or even via an
offer as shared resources, but the networks exposed in this way must be the ones which are
valid, the invalids will not be loaded by isolator, hense will not be exposed.
> 
> Avinash sridharan wrote:
>     Let me try explaining my perspective differently. If the operator was to provide
erroneous parameter to the isolator, the isolator would  bail out. My perspective is that
an erroneous network config is an erroneous parameter being provided to the isolator and we
should treat it so. I believe this would simplify any error handling that needs to be done
when frameworks launch tasks on misconfigured networks.

Can you please elaborate how this would simplify any error handling that needs to be done
when frameworks launch tasks on misconfigured networks? How can framework launches task on
a misconfigured network which will actually not be loaded by isolator?


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44269/#review122077
-----------------------------------------------------------


On March 7, 2016, 12:03 a.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44269/
> -----------------------------------------------------------
> 
> (Updated March 7, 2016, 12:03 a.m.)
> 
> 
> Review request for mesos, Avinash sridharan, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-4759
>     https://issues.apache.org/jira/browse/MESOS-4759
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Added the framework of 'network/cni' isolator.
> 
> 
> Diffs
> -----
> 
>   src/CMakeLists.txt 8f57a5701073bf1eaaa223383e928cf5db8f8ae4 
>   src/Makefile.am a41e95ddeb838fdebf4ced953c4a29181916e261 
>   src/slave/containerizer/mesos/isolators/network/cni.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/network/cni.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/44269/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message