hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rong-en Fan" <gra...@gmail.com>
Subject Re: too busy host causes NotServingRegion exception?
Date Sat, 19 Apr 2008 15:24:27 GMT
On Fri, Apr 18, 2008 at 11:39 PM, Michael Stack <stack@powerset.com> wrote:
> ....
>  > 08/04/18 01:51:15 starting compaction
>  > 08/04/18 01:51:22 region closed
>  I'd guess a split has just happened and that it was responsible for the
>  close of the region.
>  > 08/04/18 01:51:41 NotServingRegion Exception
>  > 08/04/18 01:51:47 compaction done
>  > 08/04/18 01:51:51 NotServingRegion Exception
>  > 08/04/18 01:52:01 NotServingRegion Exception
>  > 08/04/18 01:52:11 NotServingRegion Exception
>  > 08/04/18 01:52:21 NotServingRegion Exception
>  These 'exceptions' happen while there is no region available with the
>  requested row.  I'd guess during this time, the master is being told of the
>  split, it then tells other regionservers to open the new split daughters.
>  > 08/04/18 01:52:47 open the region in question
>  > 08/04/18 01:52:47 region avilable
>  Its a little distressing that it took a minute for the region to come back
>  on line.
>  > the master log somehow got truncated, IIRC, the master tried to assign the
>  > region to this region server some where between 01:51:22 and 01:51:41.
>  Out of interest, where are these log messages coming out of?  Out of a .out
>  file or out of a .log file?

It's from .out. I'm not sure why after switching to trunk, it no
longer generates
.log files, but only .out.

>  >From my understanding, this region server is a little busy so it does not
>  > accept the assignment from the master. I'm wondering if this is caused by
>  > too busy regionsserver (the request per sec on each region server is about
>  > 1000), and if so, what configuration variables should I tune with?
>  If you are doing a bunch of splitting, there may be a queue of regions to
>  open at the regionserver.  Currently they are processed serially.  Can take
>  some time.  Do you have DEBUG enabled so you can see more of whats going on
>  (There may be an issue in TRUNK setting this).

IIRC, I enabled the DEBUG via web interface, but it seems that it does not
take effects. I really don't want to set DEBUG in the log4j.properties since
it will cause my problem generates lots of debug messages. But if that's
the only way for trunk, then I can do that.

I will try to reproduce my situation with DEBUG enabled and get back to
you later (probably sometime next week).

>  > In addition, what would be the best practices when writing client by
>  > java to deal with such exception (as NotServingRegion should be common
>  > on a very busy HBase instance, I think).
>  Does this come out at your client?  If so, and its looking like the
>  wanted-region eventually comes on-line, try upping
>  hbase.client.retries.number.

Yes, it comes out from my client. The log mentioned above was the
corresponding logs in the region server's log.

Speaking of tuning the retries number, I'm wondering if people who
load lots of data into HBase can share some observations/parameter
tuning? It would be very helpful to others like me.

>  > BTW, I was getting lots of different strange failures when doing the same
>  > thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase trunk,
>  > I only get the error above. It seems there are no more mysterious exceptions
>  > :-D
>  Can we see them please?  We're operating under the perhaps false notion that
>  our releases are the most stable hbase.

I don't have them at hand, but from my memory, most of them are socket timeout
without obvious logs from hbase. Also it eats my memory and causes Java spent
too much time on GC (thank you, HBASE-512!). As the first HBase I
tried is 0.1.x,
I can not really comment on previous versions. In addition, as new
HBase requires
new Hadoop, I may encountered some bugs in the Hadoop side. Anyway, the trunk
is better from my observation here.

Rong-En Fan

>  Thanks,
>  St.Ack

View raw message