libcloud-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anthonys...@apache.org
Subject libcloud git commit: Fix a race condition in GCE’s list_nodes() Closes #727 Invoking GCE’s `list_nodes()` while some VMs are being shutdown can result in the following exception to be raised out of `list_nodes()`:
Date Mon, 11 Apr 2016 23:20:12 GMT
Repository: libcloud
Updated Branches:
  refs/heads/trunk b444381e9 -> b1d073195


Fix a race condition in GCE’s list_nodes()
Closes #727
Invoking GCE’s `list_nodes()` while some VMs are being shutdown can result
in the following exception to be raised out of `list_nodes()`:

```
  File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 1411, in list_nodes
    v.get('instances', [])]
  File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 5065, in _to_node
    extra['boot_disk'] = self.ex_get_volume(bd['name'], bd['zone'])
  File "/usr/lib/python2.7/site-packages/libcloud/compute/drivers/gce.py", line 3982, in ex_get_volume
    response = self.connection.request(request, method='GET').object
  File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line 684, in request
    *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 736, in request
    response = responseCls(**kwargs)
  File "/usr/lib/python2.7/site-packages/libcloud/common/base.py", line 119, in __init__
    self.object = self.parse_body()
  File "/usr/lib/python2.7/site-packages/libcloud/common/google.py", line 259, in parse_body
    raise ResourceNotFoundError(message, self.status, code)
libcloud.common.google.ResourceNotFoundError: {'domain': 'global', 'message': "The resource
'projects/lenaic/zones/europe-west1-c/disks/devops-reg' was not found", 'reason': 'notFound'}
```

The above error occurred while the `devops-reg` machine was being deleted.

The issue occurs when the following events happen in that order:

* [`list_nodes()` sends a request to list all the instances.](https://github.com/apache/libcloud/blob/trunk/libcloud/compute/drivers/gce.py#L1622)
  At this point, the `devops-reg` was still existing.
* The `devops-reg` instance is deleted.
* `list_nodes()` calls `_to_node` which calls [`ex_get_volume` which attempts to retrieve
the information of the volumes](https://github.com/apache/libcloud/blob/trunk/libcloud/compute/drivers/gce.py#L4235)
  But, as the instance was deleted since it was listed, `ex_get_volume` raises a `ResourceNotFoundError`
exception.

When this happens, we should simply discard the node that was deleted during the execution
of `list_nodes()` and return the information about the other nodes.


Project: http://git-wip-us.apache.org/repos/asf/libcloud/repo
Commit: http://git-wip-us.apache.org/repos/asf/libcloud/commit/b1d07319
Tree: http://git-wip-us.apache.org/repos/asf/libcloud/tree/b1d07319
Diff: http://git-wip-us.apache.org/repos/asf/libcloud/diff/b1d07319

Branch: refs/heads/trunk
Commit: b1d0731959637ab9df8876a3f3b9ad9fb2f38efd
Parents: b444381
Author: Lénaïc Huard <lhuard@amadeus.com>
Authored: Fri Mar 25 17:00:48 2016 +0100
Committer: anthony-shaw <anthony.p.shaw@gmail.com>
Committed: Tue Apr 12 09:19:46 2016 +1000

----------------------------------------------------------------------
 libcloud/compute/drivers/gce.py | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/libcloud/blob/b1d07319/libcloud/compute/drivers/gce.py
----------------------------------------------------------------------
diff --git a/libcloud/compute/drivers/gce.py b/libcloud/compute/drivers/gce.py
index 14e10ab..4aa92ac 100644
--- a/libcloud/compute/drivers/gce.py
+++ b/libcloud/compute/drivers/gce.py
@@ -1625,11 +1625,31 @@ class GCENodeDriver(NodeDriver):
             # The aggregated response returns a dict for each zone
             if zone is None:
                 for v in response['items'].values():
-                    zone_nodes = [self._to_node(i) for i in
-                                  v.get('instances', [])]
-                    list_nodes.extend(zone_nodes)
+                    for i in v.get('instances', []):
+                        try:
+                            list_nodes.append(self._to_node(i))
+                        # If a GCE node has been deleted between
+                        #   - is was listed by `request('.../instances', 'GET')
+                        #   - it is converted by `self._to_node(i)`
+                        # `_to_node()` will raise a ResourceNotFoundError.
+                        #
+                        # Just ignore that node and return the list of the
+                        # other nodes.
+                        except ResourceNotFoundError:
+                            pass
             else:
-                list_nodes = [self._to_node(i) for i in response['items']]
+                for i in response['items']:
+                    try:
+                        list_nodes.append(self._to_node(i))
+                    # If a GCE node has been deleted between
+                    #   - is was listed by `request('.../instances', 'GET')
+                    #   - it is converted by `self._to_node(i)`
+                    # `_to_node()` will raise a ResourceNotFoundError.
+                    #
+                    # Just ignore that node and return the list of the
+                    # other nodes.
+                    except ResourceNotFoundError:
+                        pass
         return list_nodes
 
     def ex_list_regions(self):


Mime
View raw message