hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rong-en Fan" <gra...@gmail.com>
Subject Re: too busy host causes NotServingRegion exception?
Date Sat, 19 Apr 2008 15:26:39 GMT
On Sat, Apr 19, 2008 at 12:14 AM, Bryan Duxbury <bryan@rapleaf.com> wrote:
> NotServingRegionExceptions are normal when they appear in the regionserver
> logs. They're not normal when they come out of your client code. You get an
> NSRE when a region gets split or reassigned and the client's cache of the
> region's location is out of date. Normally, the HTable client retries a
> bunch, and eventually it gets sorted out. However, if the
> reassignment/splitting/etc takes longer than all the retries, the client
> will get the NSRE. In general we'd like for those not to happen, but I'm not
> sure that there's actually something wrong.
>
>  When you say once in a while, how frequent are you talking about?

Well, first occurs after one hour of writing and second one occurs few minutes
later. However, after I sent the mail, it has no problems at all for
the next couple
hours of writing.

Regards,
Rong-En Fan

>  If you want to tune this problem away, you can edit your hbase-site.xml and
> change hbase.client.retries to be a bigger number and/or hbase.client.pause
> to be longer. That might resolve your issue. If something is actually broken
> in HBase, more retries won't help, and that would be an interesting fact to
> know. If it is just a timing/load issue, then more retries or a longer pause
> will probably fix it. This would also be a really interesting fact to know
> :).
>
>  Glad to hear that trunk erases some of the mystery of 0.16!
>
>  -Bryan
>
>
>
>  On Apr 18, 2008, at 3:29 AM, Rong-en Fan wrote:
>
>
> > I'm running hbase and hadoop-0.17 trunk code as of earlier today (without
> > HBASE-10). While loading 50m records into a table with ~800,000 rows with
> only
> > one column family. This is a 3 node DFS and 3 region servers. I load
> > the data from one of these three boxes. Once awhilte, I got
> NotServingRegion
> > exception, the code looks like
> >
> > BatchUpdate bu = new BatchUpdate(row)
> > bu.put(...)
> > table.commit(bu)
> >
> > When I examine region server's log, it shows something like:
> >
> > 08/04/18 01:51:14 open the region in question
> > 08/04/18 01:51:15 region available
> > 08/04/18 01:51:15 starting compaction
> > 08/04/18 01:51:22 region closed
> > 08/04/18 01:51:41 NotServingRegion Exception
> > 08/04/18 01:51:47 compaction done
> > 08/04/18 01:51:51 NotServingRegion Exception
> > 08/04/18 01:52:01 NotServingRegion Exception
> > 08/04/18 01:52:11 NotServingRegion Exception
> > 08/04/18 01:52:21 NotServingRegion Exception
> > 08/04/18 01:52:47 open the region in question
> > 08/04/18 01:52:47 region avilable
> >
> > the master log somehow got truncated, IIRC, the master tried to assign the
> > region to this region server some where between 01:51:22 and 01:51:41.
> >
> > From my understanding, this region server is a little busy so it does not
> > accept the assignment from the master. I'm wondering if this is caused by
> > too busy regionsserver (the request per sec on each region server is about
> > 1000), and if so, what configuration variables should I tune with?
> > In addition, what would be the best practices when writing client by
> > java to deal with such exception (as NotServingRegion should be common
> > on a very busy HBase instance, I think).
> >
> > BTW, I was getting lots of different strange failures when doing the same
> > thing on hadoop-0.16.X and hbase-0.1.X. After switching to hbase trunk,
> > I only get the error above. It seems there are no more mysterious
> exceptions :-D
> >
> > Thanks,
> > Rong-En Fan
> >
>
>

Mime
View raw message