hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brush,Ryan" <RBR...@CERNER.COM>
Subject NoRouteToHostException causes Master abort when the RegionServer hosting ROOT is not available
Date Fri, 01 Apr 2011 15:48:44 GMT
This happens in similar conditions but is distinct from HBASE-3617. When the region hosting
ROOT isn't available during restart, the NoRouteToHostException propagates all the way up
the call stack and causes the master to abort.  It looks like this can be addressed by handling
NoRouteToHostException at some point and considering that node/region server offline.

I applied the patch from HBASE-3617 and it didn't fix the problem I'm seeing, which I expected
given the stack trace below.  Assuming this reasoning is correct, does this merit a separate
JIRA?  It does seem critical in that the failure of a single node is preventing us from being
up our cluster.

2011-04-01 10:15:19,472 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on
regionserver(s) to checkin; count=2, stopped=false, count of regions out on cluster=0
2011-04-01 10:15:19,486 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://iphadoop01:9000/hbase/.logs/iphadoop03.northamerica.cerner.net,60020,1301665635981
belongs to an existing region server
2011-04-01 10:15:19,486 INFO org.apache.hadoop.hbase.master.MasterFileSystem: Log folder hdfs://iphadoop01:9000/hbase/.logs/iphadoop05.northamerica.cerner.net,60020,1301665659785
belongs to an existing region server
2011-04-01 10:15:22,508 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception.
Starting shutdown.
java.net.NoRouteToHostException: No route to host
     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
     at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
     at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
     at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
     at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
     at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
     at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
     at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
     at $Proxy6.getProtocolVersion(Unknown Source)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
     at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
     at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
     at org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:385)
     at org.apache.hadoop.hbase.catalog.CatalogTracker.waitForRootServerConnection(CatalogTracker.java:211)
     at org.apache.hadoop.hbase.catalog.CatalogTracker.verifyRootRegionLocation(CatalogTracker.java:458)
     at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:425)
     at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:383)
     at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
2011-04-01 10:15:22,510 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-04-01 10:15:22,510 DEBUG org.apache.hadoop.hbase.master.HMaster: Stopping service threads

----------------------------------------------------------------------
CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation
and are intended only for the addressee. The information contained in this message is confidential
and may constitute inside or non-public information under international, federal, or state
securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such
information is strictly prohibited and may be unlawful. If you are not the addressee, please
promptly delete this message and notify the sender of the delivery error by e-mail or you
may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024.

Mime
View raw message