hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase Exceptions on version 0.20.1
Date Wed, 21 Oct 2009 02:39:10 GMT
The reason JG points to load as being a problem as all signs point to it: This is usually the
culprit behind DFS "no live block" errors -- the namenode is too busy and/or falling behind,
or the datanodes are falling behind, or actually failing. Also, in the log snippets you provide,
HBase is complaining about writes to DFS (for the WAL) taking in excess of 2 seconds. Also
highly indicative of load, write load. Shortly after this, Zookeeper sessions begin expiring,
which is also usually indicative of overloading -- heartbeats miss their deadline. 

When I see these signs on my test clusters, I/O wait is generally in excess of 40%. 

If your total CPU load is really just a few % (user + system + iowait), then I'd suggest you
look at the storage layer. Is there anything in the datanode logs that seems like it might
be relevant?

What about the network? Gigabit? Any potential sources of contention? Are you tracking network
utilization metrics during the test?

Also, you might consider using Ganglia to monitor and correlate system metrics and HBase and
HDFS metrics during your testing, if you are not doing this already. 

   - Andy

From: elsif <elsif.then@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Wed, October 21, 2009 7:47:43 AM
Subject: Re: HBase Exceptions on version 0.20.1

The cpu load on each of the nodes never goes above 1 and very little if
any swap is in use.

The 4MB is our boundary case - the actual data will always be smaller. 
This was just a encapsulated test that reproduces our issue.

I will re-test with the CMS collector and logging enabled and reply back.

> Most of these exceptions look related to overloaded servers (GC pauses
> causing timeouts, high IO wait tripping up the datanodes, etc).  Have
> you turned on GC logging?  Also, are you swapping on these nodes?
> Check out the performance tuning page here:
> http://wiki.apache.org/hadoop/PerformanceTuning
> The WrongRegionException at the end could be a fault but it's hard to
> know without seeing the entire context and knowing what the cluster
> was up to at that point.
> Performance can degrade as the JVMs fill up, get more and more
> fragmented, and the GC gets slower.
> Also, you are inserting 4MB values?  Those are fairly large, at the
> upper-end of what you would want to put into HBase.  Is this your
> actual use case?  At the least you'll want to increase your region
> size (otherwise you're going to have at most 64 rows per region, often
> less), but also consider if HBase is the right place to store 4MB values.
> Hope that helps.
> JG

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message