hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: uneven region distribution
Date Sat, 15 Feb 2014 03:43:14 GMT
Please take a look at http://hbase.apache.org/book.html#hbase_metrics.

You should pay attention to callQueueLength, compactionQueueLength,
readRequestsCount and writeRequestsCount.

Cheers


On Fri, Feb 14, 2014 at 7:13 PM, Rohit Kelkar <rohitkelkar@gmail.com> wrote:

> It could have been under load because I am not salting the keys. If I were
> in a position to replicate this issue what metrics should I capture so
> that I find whether it was under load?
>
> - R
>
> On Friday, February 14, 2014, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > From region server log - was server5 under heavy load ?
> >
> >
> >    1. 2014-02-14 16:06:05,700 WARN org.apache.hadoop.hbase.util.Sleeper:
> We
> >    slept 99984ms instead of 3000ms, this is likely due to a long garbage
> >    collecting pause and it's usually bad, see
> >    http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> >    2. ...
> >    3. 2014-02-14 16:06:05,783 FATAL
> >    org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > server
> >    server5,60020,1392355987269: Unhandled exception:
> >    org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> >    currently processing server5,60020,1392355987269 as dead server
> >
> >
> >
> > On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <rohitkelkar@gmail.com
> <javascript:;>>
> > wrote:
> >
> > > Thanks for your inputs,
> > > I am sharing the master log - http://pastebin.com/Xi9P6Ykr
> > > and the region server log of the failed region server -
> > > http://pastebin.com/1munghDv
> > >
> > > - R
> > >
> > >
> > > On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <yuzhihong@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > > > Looking at bug fix since 0.94.2, I wonder if you are experiencing the
> > > > following which went into 0.94.10 :
> > > > HBASE-8432 a table with unbalanced regions will balance indefinitely
> > > >
> > > > Master log would tell us more.
> > > >
> > > >
> > > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <rohitkelkar@gmail.com
> <javascript:;>
> > >
> > > > wrote:
> > > >
> > > > > Sorry mis-stated the version, its 0.94.2
> > > > >
> > > > > - R
> > > > >
> > > > >
> > > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yuzhihong@gmail.com
> <javascript:;>>
> > wrote:
> > > > >
> > > > > > bq.  it does not change the status of the assignments.
> > > > > >
> > > > > > Can you check / pastebin master log to see what caused the
> > balancing
> > > to
> > > > > > stop ?
> > > > > >
> > > > > > bq. attributing the region server crash to the disproportionately
> > > high
> > > > > > number of regions on that server?
> > > > > >
> > > > > > Checking region server log on server5 should give us more clue.
> > > > > >
> > > > > > bq. 0.92.4
> > > > > >
> > > > > > please consider upgrading :-)
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <
> > rohitkelkar@gmail.com <javascript:;>
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > I am using hbase version 0.92.4 on a 5 node cluster. I
am
> seeing
> > > > that a
> > > > > > > particular region server often crashes. A status 'simple'
on
> > hbase
> > > > > shell
> > > > > > > gives the following stats
> > > > > > >
> > > > > > >
> > > > > > > HBase Shell; enter 'help<RETURN>' for list of supported
> commands.
> > > > Type
> > > > > > > "exit<RETURN>" to leave the HBase Shell Version 0.94.2,
> r1395367,
> > > Sun
> > > > > > Oct 7
> > > > > > > 19:11:01 UTC 2012
> > > > > > > status 'simple' 4 live servers
> > > > > > > server7:60020 1392017875910 requestsPerSecond=0,
> > > > > > numberOfOnlineRegions=419,
> > > > > > > usedHeapMB=3315, maxHeapMB=6127
> > > > > > > server4:60020 1392300859332 requestsPerSecond=843,
> > > > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > > > > > > server3:60020 1391583646998 requestsPerSecond=429,
> > > > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > > > > > > server6:60020 1391583647588 requestsPerSecond=0,
> > > > > > numberOfOnlineRegions=966,
> > > > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > > > > > > server5,60020,1392108515637 Aggregate load: 1272, regions:
2417
> > > > > > >
> > > > > > > The dead region server has 2417 regions as opposed to 419,
379,
> > > 653,
> > > > > 966
> > > > > > > regions on other servers. Am I right in attributing the
region
> > > server
> > > > > > crash
> > > > > > > to the disproportionately high number of regions on that
> server?
> > > > > > >
> > > > > > > If I invoke the balancer on hbase shell using the "balancer"
> > > command
> > > > it
> > > > > > > returns true. But it does not change the status of the
> > assignments.
> > > > > > >
> > > > > > > - R
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message