whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrei Savu <savu.and...@gmail.com>
Subject Re: ec2 hadoop cluster creation failure (from ec2)
Date Mon, 17 Oct 2011 20:48:37 GMT
Chris

We are now working on upgrading the trunk to jclouds 1.2.1 and that should
fix some of the ssh related issues and make things more reliable. With the
current version I think you should handle the errors by retrying the cluster
build process from scratch.

Cheers,

-- Andrei Savu

On Sat, Oct 15, 2011 at 6:42 AM, Chris Schilling
<chris@thecleversense.com>wrote:

> Hi,
>
> I think I am also seeing this problem:
> https://issues.apache.org/jira/browse/WHIRR-378
>
> I am trying to run whirr from an ec2 instance.  The failure occurs after
> the machines are launched:
> Starting to run configuration scripts on cluster instances:
> us-east-1/i-cacff4aa
> Starting to run configuration scripts on cluster instances:
> us-east-1/i-c4cff4a4
> Running configuration script on: us-east-1/i-cacff4aa
> Running configuration script on: us-east-1/i-c4cff4a4
> <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException:
> publickey auth failed
> <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException:
> publickey auth failed
> <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException:
> publickey auth failed
> <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException:
> publickey auth failed
> <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException:
> publickey auth failed
> Dying because - java.net.SocketTimeoutException: Read timed out
> Dying because - java.net.SocketTimeoutException: Read timed out
> <<authenticated>> woke to: net.schmizz.sshj.userauth.UserAuthException:
> publickey auth failed
> Dying because - java.net.SocketTimeoutException: Read timed out
> Dying because - java.net.SocketTimeoutException: Read timed out
> Dying because - java.net.SocketTimeoutException: Read timed out
> Dying because - java.net.SocketTimeoutException: Read timed out
> Dying because - java.net.SocketTimeoutException: Read timed out
> Dying because - java.net.SocketTimeoutException: Read timed out
>
> My initialization script is rather simple:
> whirr.cluster-name=whirr-hadoop
> whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,1
> hadoop-datanode+hadoop-tasktracker
> whirr.provider=aws-ec2
> whirr.identity=${env:AWS_ACCESS_KEY_ID}
> whirr.credential=${env:AWS_SECRET_ACCESS_KEY}
> hadoop-mapreduce.mapred.child.java.opts=-Xmx1000m
> hadoop-mapreduce.mapred.child.ulimit=1500000
>
> I also tried running this script with these two lines (but I don't think it
> matters):
>
> whirr.private-key-file=${sys:user.home}/.ssh/id_rsa
> whirr.public-key-file=${sys:user.home}/.ssh/id_rsa.pu
>
> Anyway, is there a solution available to this jira?  Andrei, you mentioned
> that you had it running from ec2 correctly.  Perhaps you have some insight?
>  I can provide a log file if necessary.
>
> P.S.  This works properly when running from my local machine!
>
> Again, thanks for the support.  If there is anything I can do to help
> debug, please let me know!
> Chris Schilling
> Sr. Data Mining Engineer
> Clever Sense, Inc.
> "Curating the World Around You"
> --------------------------------------------------------------
> Winner of the 2011 Fortune Brainstorm Start-up Idol<http://tech.fortune.cnn.com/2011/07/20/startup-idol-brainstorm-clever-sense/>
>
> Wanna join the Clever Team? We're hiring!<http://www.thecleversense.com/jobs.html>
> --------------------------------------------------------------
>
>

Mime
View raw message