hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: A region server stopped (timeout after trying to connect local Zookeeper)
Date Thu, 22 Nov 2012 00:39:41 GMT
I think the MAIN difference is the uppercase on the property... Seems
that hbase-site.xml is case sensitive (which seems to be normal in
Java and unix world).

You might want to retry by putting back the uppercase to see if this
was the issue.

JM

2012/11/21, ac@hsk.hk <ac@hsk.hk>:
> Hi
>
> I changed the order of ZooKeepers in the value of hbase.zookeeper.quorum,
> from "m146,m145,m143" to "m143,m145,m146", set timeout from 60000 to 70000,
> and commented out lzo property.  it works now, here is the diff
>
> 1) $ diff hbase-site.xml hbase-site.xml.xxx
> 41,44c41,43
> <
> < <property>
> < <name>hbase.zookeeper.quorum</name>
> < <value>m143,m145,m146</value>
> ---
>> <property>
>> <name>hbase.ZooKeeper.quorum</name>
>> <value>m146,m145,m143</value>
> 49c48,55
> < <value>70000</value>
> ---
>> <value>60000</value>
>> </property>
>>
>> <!--
>> /**
>> <property>
>> <name>hbase.regionserver.codecs</name>
>> <value>lzo,gz</value>
> 50a57,58
>> **/
>> -->
>
> Above is the only change today .
>
>
> 2) hbase log:
> 2012-11-22 07:26:19,431 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=m145:2181,m143:2181,m146:2181
> sessionTimeout=70000 watcher=regionserver:6$
>
>
> I don't know why but it works now. It seems that hbase somehow could not
> read in hbase-site.xml correctly.
>
>
> Thanks
>
>
>
>
> On 22 Nov 2012, at 7:51 AM, Jean-Marc Spaggiari wrote:
>
>> Can you do JPS on your master and look at the logs too?
>>
>> Another think, can you try with hbase.zookeeper.quorum instead of
>> hbase.ZooKeeper.quorum?
>>
>> 2012/11/21, ac@hsk.hk <ac@hsk.hk>:
>>> Hi,
>>>
>>> Here are my HBase configuration and test:
>>>
>>> 1) {$HBASE_HOME}hbase/conf/hbase-site.xml
>>> <property>
>>> <name>hbase.ZooKeeper.quorum</name>
>>> <value>m146,m145,m143</value>
>>> </property>
>>>
>>> <property>
>>> <name>zookeeper.session.timeout</name>
>>> <value>60000</value>
>>> </property>
>>>
>>>
>>> 2) {$HBASE_HOME}hbase/conf/hbase-env.sh
>>> export HBASE_MANAGES_ZK=false
>>>
>>>
>>> 3) I used " {$ZK_HOME}/bin/zkCli.sh -server m145,m146,m143"  to test the
>>> connection, it worked
>>> [zk: m145,m146,m143(CONNECTED) 0]
>>>
>>>
>>> 4) from the logs, I found that the connectString was odd, the
>>> RegionServer
>>> did not use the setting of "hbase.ZooKeeper.quorum" in
>>> conf/hbase-site.xml,
>>> it seemed that it always used the default and tried to connect
>>> "localhost:2181" in the distributed cluster:
>>>
>>> 	2012-11-21 17:21:42,299 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=localhost:2181 sessionTimeout=60000
>>> watcher=regionserver:60020
>>> 	...
>>> 	2012-11-21 17:21:42,313 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server localhost/127.0.0.1:2181. Will not attempt
>>> to
>>> authenticate using SASL (Unable to locate a login configura$
>>> 	...
>>> 	2012-11-21 17:21:42,316 WARN org.apache.zookeeper.ClientCnxn: Session
>>> 0x0
>>> for server null, unexpected error, closing socket connection and
>>> attempting
>>> reconnect java.net.ConnectException: Connection refused
>>> 	...  (remark: it tried above 3 times, then had FATAL error as follows)
>>>
>>> 	2012-11-21 17:21:57,846 ERROR
>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: regionserver:60020
>>> Received unexpected KeeperException, re-throwing exception
>>> 	...
>>> 	2012-11-21 17:21:57,847 FATAL
>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>> server
>>> ...
>>>
>>>
>>>
>>> Please help.
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>> On 22 Nov 2012, at 1:22 AM, Jean-Marc Spaggiari wrote:
>>>
>>>> Hi,
>>>>
>>>> What do you have on your HBase configuration? Are you passing the name
>>>> of the Quorum servers?
>>>> $ cat conf/hbase-site.xml
>>>> ......
>>>> </property>
>>>>   <property>
>>>>     <name>hbase.zookeeper.quorum</name>
>>>>     <value>cube,latitude,node3</value>
>>>>     <description>Comma separated list of servers in the ZooKeeper
>>>> Quorum.
>>>>     For example,
>>>> "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
>>>>     By default this is set to localhost for local and
>>>> pseudo-distributed
>>>> modes
>>>>     of operation. For a fully-distributed setup, this should be set to
>>>> a
>>>> full
>>>>     list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in
>>>> hbase-env.sh
>>>>     this is the list of servers which we will start/stop ZooKeeper on.
>>>>     </description>
>>>>   </property>
>>>> .....
>>>>
>>>> 2012/11/21, ac@hsk.hk <ac@hsk.hk>:
>>>>> Hi,
>>>>>
>>>>>
>>>>> I have the following line in /etc/hosts in all servers, should I keep
>>>>> it
>>>>> or
>>>>> comment it out or ...?
>>>>>
>>>>> 127.0.0.1       localhost
>>>>>
>>>>> Please help.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> On 21 Nov 2012, at 7:16 PM, ac@hsk.hk wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> Please help!!
>>>>>>
>>>>>> HBase version: 0.94
>>>>>> ZooKeeper: 3.4.4
>>>>>>
>>>>>> One of the regional servers stopped very quickly after HBASE is
>>>>>> started:
>>>>>>
>>>>>> ### Check JPS after HBASE cluster was started, could find the
>>>>>> HRegionServer process (*** there is no any ZooKeeper instance running
>>>>>> in
>>>>>> this server ***)
>>>>>> $ jps
>>>>>> 24767 Jps
>>>>>> 18418 TaskTracker
>>>>>> 24678 HRegionServer
>>>>>> 18156 DataNode
>>>>>>
>>>>>> ### Wait a while and checked JPS again,  HRegionServer process gone
>>>>>> $ jps
>>>>>> 18418 TaskTracker
>>>>>> 24784 Jps
>>>>>> 18156 DataNode
>>>>>>
>>>>>>
>>>>>> ### Here is the setting in hbase-site.xml ( enabled
>>>>>> hbase.cluster.distributed, set up 3 ZooKeepers, timeout= 60000)
>>>>>> <property>
>>>>>> <name>hbase.cluster.distributed</name>
>>>>>> <value>true</value>
>>>>>> </property>
>>>>>>
>>>>>> <property>
>>>>>> <name>hbase.ZooKeeper.quorum</name>
>>>>>> <value>m146,m145,m143</value>
>>>>>> </property>
>>>>>>
>>>>>> <property>
>>>>>> <name>zookeeper.session.timeout</name>
>>>>>> <value>60000</value>
>>>>>> </property>
>>>>>>
>>>>>>
>>>>>> ### hbase-env.sh also tells HBASE not to manage local instance of
>>>>>> ZooKeeper
>>>>>> export HBASE_MANAGES_ZK=false
>>>>>>
>>>>>>
>>>>>> ###This server can connect to the 3 ZooKeepers,
>>>>>> ./zkCli.sh -server m145,m146,m143  	==>  [zk:
>>>>>> m145,m146,m143(CONNECTED)
>>>>>> 0]
>>>>>>
>>>>>>
>>>>>> ### checked the hbase log file, found something odd,  seemed that
it
>>>>>> tried
>>>>>> to connect local ZooKeeper
>>>>>> 2012-11-21 17:30:33,066 INFO org.apache.zookeeper.ZooKeeper:
>>>>>> Initiating
>>>>>> client connection, connectString=localhost:2181 sessionTimeout=60000
>>>>>> watcher=regionserver:60020
>>>>>>
>>>>>> 2012-11-21 17:31:33,254 WARN
>>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
>>>>>> transient
>>>>>> ZooKeeper exception:
>>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/master
>>>>>>
>>>>>> 2012-11-21 17:31:33,254 INFO
>>>>>> org.apache.hadoop.hbase.util.RetryCounter:
>>>>>> Sleeping 2000ms before retry #1...
>>>>>> 2012-11-21 17:32:33,262 INFO org.apache.zookeeper.ClientCnxn: Client
>>>>>> session timed out, have not heard from server in 60010ms for
>>>>>> sessionid
>>>>>> 0x0, closing socket connection and attempting reconnect
>>>>>>
>>>>>> 2012-11-21 17:32:33,362 WARN
>>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly
>>>>>> transient
>>>>>> ZooKeeper exception:
>>>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>>>> KeeperErrorCode = ConnectionLoss for /hbase/master
>>>>>>
>>>>>> ......
>>>>>>
>>>>>> 2012-11-21 17:34:33,570 ERROR
>>>>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper
>>>>>> exists
>>>>>> failed after 3 retries
>>>>>> 2012-11-21 17:34:33,571 WARN
>>>>>> org.apache.hadoop.hbase.zookeeper.ZKUtil:
>>>>>> regionserver:60020 Unable to set watcher on znode /hbase/master
>>>>>> 2012-11-21 17:34:33,573 ERROR
>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
>>>>>> regionserver:60020
>>>>>> Received unexpected KeeperException, re-throwing exception
>>>>>> 2012-11-21 17:34:33,573 FATAL
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>>>>> server
>>>>>> ......
>>>>>> 2012-11-21 17:34:33,576 FATAL
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer
>>>>>> abort:
>>>>>> loaded coprocessors are: []
>>>>>>
>>>>>> 2012-11-21 17:34:36,580 FATAL
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>>>>>> server
>>>>>> m144,60020,1353490232962: Initialization of RS failed.  Hence
>>>>>> aborting
>>>>>> RS.
>>>>>> java.io.IOException: Received the shutdown message while waiting.
>>>>>> 	at
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:623)
>>>>>> 	at
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:598)
>>>>>> 	at
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:560)
>>>>>> 	at
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:669)
>>>>>> 	at java.lang.Thread.run(Thread.java:662)
>>>>>> 2012-11-21 17:34:36,581 FATAL
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer
>>>>>> abort:
>>>>>> loaded coprocessors are: []
>>>>>>
>>>>>>
>>>>>> Please help!
>>>>>> QUESTION: Is it a bug and I need to check something else?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>
>>>
>
>

Mime
View raw message