Paolo,

I think we've got few users that use Whirr to deploy clusters with more than 10 nodes.

My suggestion is to take a look at the configuration page because there are some settings you can tweak so that Whirr can start larger clusters.

Tibor any feedback on this? How are you handling similar issues?

On Oct 7, 2011 5:07 PM, "Paolo Castagna" <castagna.lists@googlemail.com> wrote:
Hi,
I am using Apache Whirr 0.6.0-incubating.

When I start an Hadoop cluster on EC2 using 11 datanodes/tasktrackers:
whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,11
hadoop-datanode+hadoop-tasktracker
everything seems to go fine. I sometimes see one or two instances not
able to start correctly,
but Whirr seems to terminate those and restart new ones.

If I try to run an Hadoop cluster using 20 or more
datanodes/tasktrackers the amount of errors increases.

I see a lot of errors like this:

2011-10-07 07:54:50,058 ERROR [jclouds.compute] (user thread 13) <<
problem applying options to node(eu-west-1/i-eec231a7):
org.jclouds.aws.AWSResponseException: request POST
https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 failed with code 503,
error: AWSError{requestId='af239496-844a-49c3-99d0-fdf0d01b7f45',
requestToken='null', code='RequestLimitExceeded', message='Request
limit exceeded.', context='{Response=, Errors=}'}
       at org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:74)
       at org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:71)
       at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.shouldContinue(BaseHttpCommandExecutorService.java:200)
       at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:165)
       at org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:134)
       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
       at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
       at java.lang.Thread.run(Thread.java:662)

After a while Whirr gives up and fail to start the cluster.

Any idea on why this happens?

Thanks,
Paolo