whirr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish <paliwalash...@gmail.com>
Subject Re: Cluster launch failure
Date Wed, 07 May 2014 07:04:24 GMT
Sorry to keep posting, finally got a working cluster. Had to use an EC2
node to launch the cluster. However, still can't make it work with custom
AMI (nothing special, CentOS image with Oracle JDK installed), but atleast
work can progress.

I would still like to debug it, as to why the launch from local machine is
failing, as well to make it working with custom AMI. With custom AMI, the
Node is getting public DNS name as hostname, but it's not able to bind to
it.

Any suggestions?


On Tue, May 6, 2014 at 12:00 PM, Ashish <paliwalashish@gmail.com> wrote:

> Finally able to launch the cluster. Had to specify an old generation
> hardware id.
>
> The machines are up, but Hadoop daemons are not running. Getting the
> following error in the logs (similar to
> https://issues.apache.org/jira/browse/WHIRR-749)
> /etc/hosts doesn't have entries for them either. Any suggestions on how to
> fix the error?
>
> 2014-05-06 06:23:12,262 FATAL org.apache.hadoop.mapred.JobTracker:
> java.net.UnknownHostException: Invalid hostname for server: pig-f50bcfa6
>
>         at org.apache.hadoop.ipc.Server.bind(Server.java:275)
>
>         at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:341)
>
>         at org.apache.hadoop.ipc.Server.<init>(Server.java:1539)
>
>         at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:569)
>
>         at org.apache.hadoop.ipc.RPC.getServer(RPC.java:530)
>
>         at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1985)
>
>         at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1689)
>
>         at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:1683)
>
>         at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:320)
>
>         at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:311)
>
>         at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:306)
>
>         at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:4710)
>
>
> On Sat, May 3, 2014 at 6:58 PM, Ashish <paliwalashish@gmail.com> wrote:
>
>> Hi,
>>
>> Using whirr 0.8.2
>>
>> I am not able to launch cluster (have tried all regions). I am
>> continuously getting following error
>>
>> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
>> org.jclouds.http.HttpResponseException: ec2.ap-northeast-1.amazonaws.comconnecting
to POST
>> https://ec2.ap-northeast-1.amazonaws.com/ HTTP/1.1
>>
>> at com.google.common.base.Throwables.propagate(Throwables.java:160)
>>
>> at
>> org.jclouds.aws.ec2.compute.suppliers.AWSEC2ImageSupplier.get(AWSEC2ImageSupplier.java:108)
>>
>> at
>> org.jclouds.aws.ec2.compute.suppliers.AWSEC2ImageSupplier.get(AWSEC2ImageSupplier.java:63)
>>
>> at
>> org.jclouds.rest.suppliers.MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier$SetAndThrowAuthorizationExceptionSupplierBackedLoader.load(MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier.java:91)
>>
>> at
>> org.jclouds.rest.suppliers.MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier$SetAndThrowAuthorizationExceptionSupplierBackedLoader.load(MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier.java:72)
>>
>> at
>> com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3589)
>>
>> at
>> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2374)
>>
>> at
>> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2337)
>>
>> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2252)
>>
>> at com.google.common.cache.LocalCache.get(LocalCache.java:3990)
>>
>> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3994)
>>
>> at
>> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4878)
>>
>> at
>> org.jclouds.rest.suppliers.MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier.get(MemoizedRetryOnTimeOutButNotOnAuthorizationExceptionSupplier.java:140)
>>
>> at
>> org.jclouds.ec2.compute.internal.EC2TemplateBuilderImpl.getImages(EC2TemplateBuilderImpl.java:116)
>>
>> at
>> org.jclouds.compute.domain.internal.TemplateBuilderImpl.build(TemplateBuilderImpl.java:653)
>>
>> at
>> org.apache.whirr.compute.BootstrapTemplate.build(BootstrapTemplate.java:77)
>>
>> at
>> org.apache.whirr.actions.BootstrapClusterAction.doAction(BootstrapClusterAction.java:101)
>>
>> at
>> org.apache.whirr.actions.ScriptBasedClusterAction.execute(ScriptBasedClusterAction.java:131)
>>
>> at
>> org.apache.whirr.ClusterController.bootstrapCluster(ClusterController.java:137)
>>
>> at
>> org.apache.whirr.ClusterController.launchCluster(ClusterController.java:113)
>>
>> at
>> org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:69)
>>
>> at
>> org.apache.whirr.cli.command.LaunchClusterCommand.run(LaunchClusterCommand.java:59)
>>
>> at org.apache.whirr.cli.Main.run(Main.java:69)
>>
>> at org.apache.whirr.cli.Main.main(Main.java:102)
>>
>> Caused by: java.util.concurrent.ExecutionException:
>> org.jclouds.http.HttpResponseException: ec2.ap-northeast-1.amazonaws.comconnecting
to POST
>> https://ec2.ap-northeast-1.amazonaws.com/ HTTP/1.1
>>
>> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>>
>> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>>
>> at
>> org.jclouds.concurrent.config.DescribedFuture.get(DescribedFuture.java:45)
>>
>> at
>> org.jclouds.aws.ec2.compute.suppliers.AWSEC2ImageSupplier.get(AWSEC2ImageSupplier.java:105)
>>
>> ... 22 more
>>
>>
>> and in the end is
>>
>>
>> Caused by: java.net.UnknownHostException:
>> ec2.ap-northeast-1.amazonaws.com
>>
>> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:223)
>>
>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:431)
>>
>> at java.net.Socket.connect(Socket.java:527)
>>
>> at
>> com.sun.net.ssl.internal.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:570)
>>
>> at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
>>
>>  at sun.net.www.http.HttpClient.openServer(HttpClient.java:424)
>>
>> at sun.net.www.http.HttpClient.openServer(HttpClient.java:538)
>>
>> at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:276)
>>
>> at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
>>
>> at
>> sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:172)
>>
>> at
>> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:923)
>>
>> at
>> sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:158)
>>
>> at
>> sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1031)
>>
>> at
>> sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:230)
>>
>> at
>> org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.writePayloadToConnection(JavaUrlHttpCommandExecutorService.java:269)
>>
>> at
>> org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:243)
>>
>> at
>> org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:82)
>>
>> at
>> org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:157)
>>
>> at
>> org.jclouds.http.internal.BaseHttpCommandExecutorService$HttpResponseCallable.call(BaseHttpCommandExecutorService.java:135)
>>
>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>
>> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>
>>  at java.lang.Thread.run(Thread.java:695)
>>
>>
>> I am using the default hadoop-properties. The problem is persistent for
>> last couple of hours.
>>
>> Any suggestions how to resolve and get the cluster up and running.
>>
>> --
>> thanks
>> ashish
>>
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>>
>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>



-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Mime
View raw message