hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brodsky <danbrod...@gmail.com>
Subject Re: Regionservers not connecting to master
Date Wed, 17 Oct 2012 17:35:18 GMT
Well, slight change: only 1 of the ZK peers happens to work. When a RS
connects to the other 2, it doesn't go further than that. The 1 ZK
node that happens to work is the one that runs on the same VM as the

Sounds like it could be network connectivity issues, so I'm going to
investigate that a bit further, but other suggestions are welcome.

On Wed, Oct 17, 2012 at 1:29 PM, Dan Brodsky <danbrodsky@gmail.com> wrote:
> Ram,
> Thanks for your suggestions.
> The datanodes are all built using the same image, so I know they're
> all pointed to the same ZK nodes.
> I monitored all three ZK logs, the master log, and the regionserver
> log for each RS I was trying to bring back online. I'm glad I have a
> big screen. :-) Here is what I found:
> Whenever a regionserver connects to one particular ZK peer *first*, it
> never goes online. The ZK log shows a successful connection
> negotiating a timeout value, and the RS's log shows a successful ZK
> connection, but then it just sits there.
> When a regionserver starts up and connects to one of the other two ZK
> peers first, it connects to a second one successfully, then contacts
> the master, and it comes up and all is happy.
> So the problem of regionservers not connecting to master only happens
> when the RS tries one particular ZK node as its first ZK connection.
> But the logs aren't helpful for diagnosing further than that.
> Additional thoughts?
> On Wed, Oct 17, 2012 at 9:12 AM, Ramkrishna.S.Vasudevan
> <ramkrishna.vasudevan@huawei.com> wrote:
>> Can you try like start any of the regionservers that are not connecting at
>> all.  May be start 2 of them.
>> Observer master logs.  See whether it says
>> 'Waiting for RegionServers to checkin'?.
>> Just to confirm your ZK ip and port is correct thro out the cluster? If
>> multitenant cluster then you may be the other regionservers are connecting
>> to someother ZK cluster?
>> Wild guess :)
>> Regards
>> Ram
>>> -----Original Message-----
>>> From: Dan Brodsky [mailto:danbrodsky@gmail.com]
>>> Sent: Wednesday, October 17, 2012 6:31 PM
>>> To: user@hbase.apache.org
>>> Subject: Regionservers not connecting to master
>>> Good morning,
>>> I have a 10 node Hadoop/Hbase cluster, plus a namenode VM, plus three
>>> Zookeeper quorum peers (one on the namenode, one on a dedicated ZK
>>> peer VM, and one on a third box). All 10 HDFS datanodes are also Hbase
>>> regionservers.
>>> Several weeks ago, we had six HDFS datanodes go offline suddenly (with
>>> no meaningful error messages), and since then, I have been unable to
>>> get all 10 regionservers to connect to the Hbase master. I've tried
>>> bringing the cluster down and rebooting all the boxes, but no joy. The
>>> machines are all running, and hbase-regionserver appears to start
>>> normally on each one.
>>> Right now, my master status page (http://namenode:60010) shows 3
>>> regionservers online. There are also dozens of regions in transition
>>> listed on the status page (in the PENDING_OPEN state), but each of
>>> those are on one of the regionservers already online.
>>> The 7 other regionservers' log files show a successful connection to
>>> one ZK peer, followed by a regular trail of these messages:
>>> 2012-10-17 12:36:08,394 DEBUG
>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=8.17
>>> MB, free=987.67 MB, max=995.84 MB, blocks=0, accesses=0, hits=0,
>>> hitRatio=0cachingAccesses=0, cachingHits=0,
>>> cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
>>> If I had to wager a guess, it seems like the 7 offline regionservers
>>> are not connecting to other ZK peers, but there isn't anything in the
>>> ZK logs to indicate why.
>>> Thoughts?
>>> Dan

View raw message