hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baggio liu <baggi...@gmail.com>
Subject Re: HBase stability
Date Mon, 13 Dec 2010 16:45:11 GMT
Hi  Anze,
   Our production cluster used HBase 0.20.6 and hdfs (CDH3b2), and we work
for stability about a month. Some issue we have been met, and may helpful to

    1.  hbase file has short life cycle than map-red, some times there're
many blocks should be delete, we should tuning for the speed of hdfs invalid
    2. hadoop 0.20 branch can not deal with disk failure, HDFS-630 will be
    3. region server can not deal IOException rightly. When DFSClient meet
network error, it'll throw IOException, and it may be not fatal for region
server, so these IOException MUST be review.
    4. In large scale scan, there're many concurrent reader in a short time.
We must make datanode dataxceiver number to a large number, and file handle
limit should be tuning. In addition, the connection reuse between DFSClient
and datanode should be done.

    1. single thread compaction limit the speed of compaction, it should be
made multi-thread.( during multi-thread compaction we should limit network
bandwidth in compaction )
    2. single thread split HLog (read HLog) wile make Hbase down time
longer, make it multi-thread can limit HBase down time.
    3.  Additional, some tools should be done such as meta region checker,
fixer and so on.
    4.  zookeeper session timeout should be tuning according to your load on
HBase cluster.
    5.  gc stratigy should be tuning on your region server/HMaster.

    Beside upon,  in production cluster, data loss issue should be fix  as
while.(currently hadoop 0.20 append branch and CDH3b2 hadoop can be used.)
    Because of hdfs make many optimization on throughput, for application
like HBase (many random read/write) . Many tuning and change on hdfs should
be done.
    Hope this experience can be helpful to you.

Thanks & Best regard

2010/12/14 Todd Lipcon <todd@cloudera.com>

> HI Anze,
> In word, yes - 0.20.4 is not that stable in my experience, and
> upgrading to the latest CDH3 beta (which includes HBase 0.89.20100924)
> should give you a huge improvement in stability.
> You'll still need to do a bit of tuning of settings, but once it's
> well tuned it should be able to hold up under load without crashing.
> -Todd
> On Mon, Dec 13, 2010 at 2:41 AM, Anze <anzenews@volja.net> wrote:
> > Hi all!
> >
> > We have been using HBase 0.20.4 (cdh3b1) in production on 2 nodes for a
> few
> > months now and we are having constant issues with it. We fell over all
> > standard traps (like "Too many open files", network configuration
> > problems,...). All in all, we had about one crash every week or so.
> > Fortunately we are still using it just for background processing so our
> > service didn't suffer directly, but we have lost huge amounts of time
> just
> > fixing the data errors that resulted from data not being written to
> permanent
> > storage. Not to mention fixing the issues.
> > As you can probably understand, we are very frustrated with this and are
> > seriously considering moving to another bigtable.
> >
> > Right now, HBase crashes whenever we run very intensive rebuild of
> secondary
> > index (normal table, but we use it as secondary index) to a huge table. I
> have
> > found this:
> > http://wiki.apache.org/hadoop/Hbase/Troubleshooting
> > (see problem 9)
> > One of the lines read:
> > "Make sure you give plenty of RAM (in hbase-env.sh), the default of 1GB
> won't
> > be able to sustain long running imports."
> >
> > So, if I understand correctly, no matter how HBase is set up, if I run an
> > intensive enough application, it will choke? I would expect it to be
> slower
> > when under (too much) pressure, but not to crash.
> >
> > Of course, we will somehow solve this issue (working on it), but... :(
> >
> > What are your experiences with HBase? Is it stable? Is it just us and the
> way
> > we set it up?
> >
> > Also, would upgrading to 0.89 (cdh3b3) help?
> >
> > Thanks,
> >
> > Anze
> >
> >
> --
> Todd Lipcon
> Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message