hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: .META. Table
Date Tue, 04 Jan 2011 05:21:35 GMT
Well, you'd think that all .META. accesses would be cache reads for
the most part so should be minimal impact on this server.  Perhaps
this is not the case.  If you look at the regionserver log for the
server hosting .META., what does the LRU cache log line say?  Lots of
evictions?  Lots of cache hits?  Or the opposite?  If you do a listing
on the .META. region in the filesystem, how many files under the
.META./info directory?  (./bin/hadoop fs -lsr
/HBASE.ROOTDIR/.META./info)  Are there many more than one?

When transactions go to zero, anything in the regionserver log of one
of the servers that goes to zero?  A message about 'blocking'?

Are you cilents short-lived?  Clients cache locations and in general
should be going to .META. fairly rarely since they cache locations.
Is there lots of churn in region locations for some reason; i.e you
are making regions at a tidy clip... tens per second or something?
You have like 5k regions on ten servers?

Was it ulimit you had misconfigured?  If so, the cluster will have
acted in strange ways.  If ulimit was misconfigured when you had 1G
regions, then this might explain oddness.   1G regions would mean less


On Mon, Jan 3, 2011 at 6:31 PM, Wayne <wav100@gmail.com> wrote:
> We are finding that the node that is responsible for the .META. table is
> going in GC storms causing the entire cluster to go AWOL until it recovers.
> Isn't the master supposed to serve up the .META. table? Is it possible to
> Pin this table somewhere that only handles this?  Our master server and
> zookeeper servers are separate from our 10 region server nodes but in the
> end one of the region servers is responsible for the .META. table and we
> sometimes see all requests drop to zero except on the server handling the
> .META. table and the requests jump up to the number of regions+1 and back
> down. This has lasted for as long as 5 minutes before the cluster goes back
> to responding to requests normally. When we had a 1GB region size with LZO
> it was 90% in this AWOL state.
> Do we have our cluster set up correctly? Is it supposed to behave like this?
> Thanks for any advice that can be provided.

View raw message