hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: Hbase fails at moderate load.
Date Fri, 29 Jan 2010 13:08:27 GMT
Hi Michal,

[Disclaimer: I am not well experienced in HBase]

Those seem like very low memory allocations for HBase from what I've
seen / observed on this list.  I was told to not consider less than 8G
for those demons.  It could be that you need to increase all the lease
times to allow for the split to happen.

Just an idea, but if you needed to, you could consider Amazon EC2 with
XLarge instances for a very small amount to prove the concept.


2010/1/29 Michał Podsiadłowski <podsiadlowski@gmail.com>:
> Hi all!
> I'm in the middle of some performance and stability testing of out small
> hbase cluster to check if it suitable for out application.
> We want to use it as web cache persistence layer for out web app which
> handles quite large amount of traffic.
> Of course i have lot's of problems with it.
> Main one is that client applications (web servers) can persist of retrieve
> rows and fail miserably with exceptions like this:
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server address
> listed in .META. for region
> oldAppWebSingleRowCacheStore,filmMenuCuriosities-not_selected\xC2\xAC150,1264766907002
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1048)
>        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:417)
>  could not retrieve persisted cache id 'filmRanking' for key '3872'
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
> region server for region
> oldAppWebSingleRowCacheStore,filmRanking\xC2\xAC3746,1264766860498, row
> 'filmRanking\xC2\xAC3872', but failed after 2 attempts.
> Exceptions:
> org.apache.hadoop.hbase.NotServingRegionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> oldAppWebSingleRowCacheStore,filmRanking�3746,1264766860498
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2266)
>        at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1730)
> This happens every time when first region starts to split. As far as i can
> see table is set to enabled *false* (web admin), web admin becomes little
> bit less responsible - listing table regions shows no regions.
> and after a while i can see 500 or more regions. Some of them as exception
> shows are not fully available. HDFS doesn't seems to be the main issue. When
> i run fsck it says hbase dir is healthy apart from some under replicated
> blocks. Occasionaly i saw that some blocks where missing but i think this
> was due to "Too many files open" exceptions (to small regions size - now
> it's default 64)
> Amount of data is not enormous - around 1gb in less then 100k rows then this
> problems starts to occur. Request per seconds is i think small - 20-30 per
> second.
> What else i can say is I've set the max hbase retry to only 2 because we
> can't allow clients to wait more for response.
> What i would like to know is whether the table is always disabled when
> performing region splits? And is it truly disabled then so that clients
> can't do anything?
> It looks like status says disabled but still requests are processed, though,
> with different results (some like above).
> My cluster setup can be probably useful -
> 3 centos virtual machines based on xen running DN/HR and zookeeper  + one of
> them NodeMaster and Secondary Master.
> 2 gigs of ram on each. Currently hadoop processes run with Xmx 512 and hbase
> with 256 but non of them is swapping nor going out of memory.
> GC logs looks normal - stop the world is not occurring ;)
> top says cpus are nearly idle on all machines.
> It's far from ideal but we need to prove that this can work reliably to get
> more toys.
> Maybe next week we will be able to test on some better machines but for now
> that all what I've got.
> Any advices are welcome.
> Thanks,
> Michal

View raw message