mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chun-Hung Hsiao <chhs...@apache.org>
Subject Re: Review Request 69892: Made SLRP recover node-published volumes after reboot.
Date Tue, 05 Feb 2019 20:08:57 GMT


> On Feb. 5, 2019, 5:41 p.m., Benjamin Bannier wrote:
> > src/csi/state.proto
> > Lines 62-67 (original), 62-77 (patched)
> > <https://reviews.apache.org/r/69892/diff/1/?file=2123913#file2123913line62>
> >
> >     Any reason we cannot use a single field containing the `bootId` of the last
transition? A single field would cut down on the number of possible message permutations,
and also allow simpler handling (branching a changed `boot_id`, triggering `state`-dependent
handling). We could set such a `boot_id` whenever there is a state transition.

Consider the following scenario:
`CREATED` -> `NODE_READY` -> `VOL_READY` -> `PUBLISHED` -> reboot -> `VOL_READY`
-> reboot
If we share the same `boot_id` for all transitions, we won't be able to tell that this volume
has been published before.
If we dedicate `boot_id` to `PUBLISHED`, we won't be able to know that there has been a reboot
after the last `VOL_READY` so we need to call `NodeStageVolume` again.


> On Feb. 5, 2019, 5:41 p.m., Benjamin Bannier wrote:
> > src/resource_provider/storage/provider.cpp
> > Line 817 (original), 827 (patched)
> > <https://reviews.apache.org/r/69892/diff/1/?file=2123914#file2123914line830>
> >
> >     Is us not using a sequence anymore related to what is being done here?
> >     
> >     Here and below.

No. The sequence is still used. Note that the `recovered` future is wrapped by a `recoverVolume`
lambda, and we put the lambda in the sequence so all steps will be executed as a whole without
being interleved with other calls. Dropping.


- Chun-Hung


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69892/#review212557
-----------------------------------------------------------


On Feb. 5, 2019, 7:40 a.m., Chun-Hung Hsiao wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69892/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2019, 7:40 a.m.)
> 
> 
> Review request for mesos, Benjamin Bannier, James DeFelice, and Jie Yu.
> 
> 
> Bugs: MESOS-9544
>     https://issues.apache.org/jira/browse/MESOS-9544
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> If a CSI volume has been node-published before a reboot, SLRP will now
> try to bring it back to node-published again. This is important to
> perform synchronous persistent volume cleanup for `DESTROY`.
> 
> To achieve this, in addition to keeping track of the boot ID when a CSI
> volume is node-staged in `VolumeState.vol_ready_boot_id` (formerly
> `VolumeState.boot_id`), SLRP now also keeps track of the boot ID when
> the volume is node-published. This helps SLRP to better determine if a
> volume has been published before reboot.
> 
> 
> Diffs
> -----
> 
>   src/csi/state.proto 264a5657dd37605a6f3bdadd0e8d18ba9673191a 
>   src/resource_provider/storage/provider.cpp d6e20a549ede189c757ae3ae922ab7cb86d2be2c

>   src/tests/storage_local_resource_provider_tests.cpp e8ed20f818ed7f1a3ce15758ea3c366520443377

> 
> 
> Diff: https://reviews.apache.org/r/69892/diff/1/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> Testing for publish failures will be done later in chain.
> 
> 
> Thanks,
> 
> Chun-Hung Hsiao
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message