libcloud-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Müller (JIRA) <j...@apache.org>
Subject [jira] [Created] (LIBCLOUD-532) deploy_node(..) occasionally fails on EC2
Date Wed, 19 Mar 2014 12:39:42 GMT
Stefan Müller created LIBCLOUD-532:
--------------------------------------

             Summary: deploy_node(..) occasionally fails on EC2
                 Key: LIBCLOUD-532
                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-532
             Project: Libcloud
          Issue Type: Bug
          Components: Compute
         Environment: apache-libcloud 0.14.1, Windows 7
            Reporter: Stefan Müller


h2. Observed behaviour:

When I'm starting EC2 nodes with {{deploy_node(ssh_key=...)}} I occationally (about 50% of
the time) get a an error message indicating that my key is not a valid DSA key.

This seems a bit odd, since I'm using an RSA key. 

h2. Cause

Turns out the cause is somewhere else:

When starting a node, there is a short time during which the SSH daemon is already up and
running, but the public-key has not yet been put into the `authorized_keys` file. Apparently
the SSH daemon is started before Amazon's key-injection magic has finished.

During this short time (I'd guess about a second) SSH is rejecting the private key, with an
authentication error.

libcloud then tries some other means of authentication during which it apparently tries to
parse the key as a DSA key, causing the reported error.

Note that the extra-long timeout used for the SSH connection attempt is not helping in this
case, since the SSH server is replying already.

h3. Suggested Fix

I suggest to react to a failed authentication with a few retries, with a second or two delay
between them. Similarly to {{wait_until_running()}}.

h3. Workaround

{code}
deploy_node(...,ssh_alternate_usernames=["root" for _ in range(10)])
{code}

This causes libcloud to make several authentification attempts. It is slow enough to delay
until the public-key is in place. Solves the problem reliably, but not elegantly :)








--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message