hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brodsky <danbrod...@gmail.com>
Subject Re: Regionservers not connecting to master
Date Fri, 02 Nov 2012 18:13:13 GMT
Ram,

I wanted to follow up with you since you helped me with your below comment.

It turns out that the ZK configuration files somehow got changed (reverted
to their default values?), and I'm not sure who/when/how. The zoo.cfg files
didn't have the list of quorum peers, and the myid files that told each ZK
peer their ordinal value had been deleted. So, effectively, I had three ZK
standalone servers, instead of one quorum.

Problem fixed, Hbase is happy again.

Cheers,

Dan



On Wed, Oct 17, 2012 at 9:12 AM, Ramkrishna.S.Vasudevan <
ramkrishna.vasudevan@huawei.com> wrote:

> Can you try like start any of the regionservers that are not connecting at
> all.  May be start 2 of them.
> Observer master logs.  See whether it says
> 'Waiting for RegionServers to checkin'?.
>
> Just to confirm your ZK ip and port is correct thro out the cluster? If
> multitenant cluster then you may be the other regionservers are connecting
> to someother ZK cluster?
> Wild guess :)
>
> Regards
> Ram
> > -----Original Message-----
> > From: Dan Brodsky [mailto:danbrodsky@gmail.com]
> > Sent: Wednesday, October 17, 2012 6:31 PM
> > To: user@hbase.apache.org
> > Subject: Regionservers not connecting to master
> >
> > Good morning,
> >
> > I have a 10 node Hadoop/Hbase cluster, plus a namenode VM, plus three
> > Zookeeper quorum peers (one on the namenode, one on a dedicated ZK
> > peer VM, and one on a third box). All 10 HDFS datanodes are also Hbase
> > regionservers.
> >
> > Several weeks ago, we had six HDFS datanodes go offline suddenly (with
> > no meaningful error messages), and since then, I have been unable to
> > get all 10 regionservers to connect to the Hbase master. I've tried
> > bringing the cluster down and rebooting all the boxes, but no joy. The
> > machines are all running, and hbase-regionserver appears to start
> > normally on each one.
> >
> > Right now, my master status page (http://namenode:60010) shows 3
> > regionservers online. There are also dozens of regions in transition
> > listed on the status page (in the PENDING_OPEN state), but each of
> > those are on one of the regionservers already online.
> >
> > The 7 other regionservers' log files show a successful connection to
> > one ZK peer, followed by a regular trail of these messages:
> >
> > 2012-10-17 12:36:08,394 DEBUG
> > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=8.17
> > MB, free=987.67 MB, max=995.84 MB, blocks=0, accesses=0, hits=0,
> > hitRatio=0cachingAccesses=0, cachingHits=0,
> > cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
> >
> > If I had to wager a guess, it seems like the 7 offline regionservers
> > are not connecting to other ZK peers, but there isn't anything in the
> > ZK logs to indicate why.
> >
> > Thoughts?
> >
> > Dan
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message