hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ac@hsk.hk" ...@hsk.hk>
Subject Re: A region server stopped (timeout after trying to connect local Zookeeper)
Date Wed, 21 Nov 2012 13:29:54 GMT
Hi, 


I have the following line in /etc/hosts in all servers, should I keep it or comment it out
or ...?

127.0.0.1       localhost

Please help.

Thanks



On 21 Nov 2012, at 7:16 PM, ac@hsk.hk wrote:

> Hi,
> 
> 
> Please help!!
> 
> HBase version: 0.94
> ZooKeeper: 3.4.4
> 
> One of the regional servers stopped very quickly after HBASE is started:
> 
> ### Check JPS after HBASE cluster was started, could find the HRegionServer process (***
there is no any ZooKeeper instance running in this server ***)
> $ jps
> 24767 Jps
> 18418 TaskTracker
> 24678 HRegionServer
> 18156 DataNode
> 
> ### Wait a while and checked JPS again,  HRegionServer process gone
> $ jps
> 18418 TaskTracker
> 24784 Jps
> 18156 DataNode
> 
> 
> ### Here is the setting in hbase-site.xml ( enabled hbase.cluster.distributed, set up
3 ZooKeepers, timeout= 60000)
> <property>
> <name>hbase.cluster.distributed</name>
> <value>true</value>
> </property>
> 
> <property>
> <name>hbase.ZooKeeper.quorum</name>
> <value>m146,m145,m143</value>
> </property>
> 
> <property>
> <name>zookeeper.session.timeout</name>
> <value>60000</value>
> </property>
> 
> 
> ### hbase-env.sh also tells HBASE not to manage local instance of ZooKeeper
> export HBASE_MANAGES_ZK=false
> 
> 
> ###This server can connect to the 3 ZooKeepers,
> ./zkCli.sh -server m145,m146,m143  	==>  [zk: m145,m146,m143(CONNECTED) 0]
> 
> 
> ### checked the hbase log file, found something odd,  seemed that it tried to connect
local ZooKeeper 
> 2012-11-21 17:30:33,066 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=60000 watcher=regionserver:60020
> 
> 2012-11-21 17:31:33,254 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
> 
> 2012-11-21 17:31:33,254 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms
before retry #1...
> 2012-11-21 17:32:33,262 INFO org.apache.zookeeper.ClientCnxn: Client session timed out,
have not heard from server in 60010ms for sessionid 0x0, closing socket connection and attempting
reconnect
> 
> 2012-11-21 17:32:33,362 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /hbase/master
> 
> ......
> 
> 2012-11-21 17:34:33,570 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper:
ZooKeeper exists failed after 3 retries
> 2012-11-21 17:34:33,571 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:60020
Unable to set watcher on znode /hbase/master
> 2012-11-21 17:34:33,573 ERROR org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020
Received unexpected KeeperException, re-throwing exception
> 2012-11-21 17:34:33,573 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region server ......
> 2012-11-21 17:34:33,576 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: []
> 
> 2012-11-21 17:34:36,580 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region server m144,60020,1353490232962: Initialization of RS failed.  Hence aborting RS.
> java.io.IOException: Received the shutdown message while waiting.
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:623)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:598)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:560)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:669)
> 	at java.lang.Thread.run(Thread.java:662)
> 2012-11-21 17:34:36,581 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer
abort: loaded coprocessors are: []
> 
> 
> Please help!
> QUESTION: Is it a bug and I need to check something else?  
> 
> Thanks
> 
> 
> 
> 
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message