mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yu <yujie....@gmail.com>
Subject Re: Review Request 65465: Windows: Fixed recovery of Mesos containerizer.
Date Thu, 01 Feb 2018 22:32:39 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65465/#review196662
-----------------------------------------------------------




src/slave/containerizer/mesos/main.cpp
Lines 40-50 (patched)
<https://reviews.apache.org/r/65465/#comment276403>

    Flying by. Why this logic is not in launch.cpp? Sounds to me it's unrelated to, for example,
Mount below?


- Jie Yu


On Feb. 1, 2018, 7:57 p.m., Andrew Schwartzmeyer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65465/
> -----------------------------------------------------------
> 
> (Updated Feb. 1, 2018, 7:57 p.m.)
> 
> 
> Review request for mesos, Akash Gupta, Jie Yu, and Joseph Wu.
> 
> 
> Bugs: MESOS-8519
>     https://issues.apache.org/jira/browse/MESOS-8519
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> The Windows OS deletes the job object created in the agent process when
> the agent dies, because no other process holds a handle to it (despite
> processes being assigned to the job object). While this is
> counter-intuitive, it is the observed behavior. So in order for recovery
> to succeed, the containerizer must also hold an otherwise unused handle
> to its job object to keep it alive in the kernel, and available for
> recovery to find.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/main.cpp a53ccd68bf975d919f9d1f920cf3fa74d4e43f24 
> 
> 
> Diff: https://reviews.apache.org/r/65465/diff/1/
> 
> 
> Testing
> -------
> 
> ```
> [----------] Global test environment tear-down
> [==========] 874 tests from 85 test cases ran. (253311 ms total)
> [  PASSED  ] 874 tests.
> 
> I0201 12:46:58.159368  3116 slave.cpp:6921] Recovering framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
> I0201 12:46:58.159368  3116 slave.cpp:8543] Recovering executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c'
of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
> I0201 12:46:58.162847  9456 task_status_update_manager.cpp:207] Recovering task status
update manager
> I0201 12:46:58.162847  9456 task_status_update_manager.cpp:215] Recovering executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c'
of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
> I0201 12:46:58.166851  7344 containerizer.cpp:674] Recovering containerizer
> I0201 12:46:58.167351  7344 containerizer.cpp:731] Recovering container 69cefa53-61e0-444b-a808-e38ffb4cb18f
for executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
> I0201 12:46:58.183379 17088 provisioner.cpp:493] Provisioner recovery complete
> I0201 12:46:58.186367 16792 slave.cpp:6695] Sending reconnect request to executor 'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c'
of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000 at executor(1)@10.123.7.41:52591
> I0201 12:46:58.194370  7344 slave.cpp:4519] Received re-registration message from executor
'notepad.01d79d48-0791-11e8-8f77-02421c3bc93c' of framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
> I0201 12:47:00.193958 16792 slave.cpp:4737] Cleaning up un-reregistered executors
> I0201 12:47:00.193958 16792 slave.cpp:6824] Finished recovery
> I0201 12:47:00.200943  9456 task_status_update_manager.cpp:181] Pausing sending task
status updates
> I0201 12:47:00.200943  3116 slave.cpp:1146] New master detected at master@10.123.6.78:5050
> I0201 12:47:00.200943  3116 slave.cpp:1190] No credentials provided. Attempting to register
without authentication
> I0201 12:47:00.200943  3116 slave.cpp:1201] Detecting new master
> I0201 12:47:00.214944 16792 slave.cpp:1471] Re-registered with master master@10.123.6.78:5050
> I0201 12:47:00.214944 13180 task_status_update_manager.cpp:188] Resuming sending task
status updates
> I0201 12:47:00.215942 16792 slave.cpp:1516] Forwarding agent update {"operations":{},"resource_version_uuid"
{"value":"jLIL1d\/PQnuwmFxpMf8CLQ=="},"slave_id":{"value":"7dc02270-a4e1-4f59-9ad7-56bad5182ea4S3"},"update_oversubscribed_resources":true}
> I0201 12:47:00.219952  3116 slave.cpp:3625] Updating info for framework eb32cef4-c503-4ab7-85d4-8d4577e6a3bf-0000
with pid updated to scheduler-aaa62980-8b1b-4775-b8bb-c6890b41941e@10.123.6.78:45907
> I0201 12:47:00.233942  7344 task_status_update_manager.cpp:188] Resuming sending task
status updates
> ```
> 
> 
> Thanks,
> 
> Andrew Schwartzmeyer
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message