hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray" <jl...@streamy.com>
Subject RE: Regionserver fails to serve region
Date Fri, 17 Oct 2008 15:30:06 GMT

Did you see my reply to your previous email?

I think your machines are underpowered for your current setup and it's
creating all kinds of problems.  If you have swapping going on in a
regionserver/datanode, that must be addressed because it usually leads to
odd behavior in hdfs, timeouts, starvation, etc...

Decrease your allotted heap sizes to fit within available memory, or add
more memory.


-----Original Message-----
From: Jean-Adrien [mailto:adv1@jeanjean.ch] 
Sent: Friday, October 17, 2008 1:02 AM
To: hbase-user@hadoop.apache.org
Subject: Regionserver fails to serve region

Hello again.
This is my last message for today

I have often an exception in my HBase client. A regionserver fails to serve
a region when the client get a row on the HBase cluster.

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server for region
table-0.3,:testrow79063200,1223872616091, row ':testrow22102600', but failed
after 10 attempts.

The attempts of above can be:
java.io.IOException: java.io.IOException: Premeture EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
java.io.IOException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)

After what. Every time the client try to reach the same region the attemps
1-10 are
java.io.IOException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)

In this case, if the client try to reach the same region again, all next 10
attemps are the NPE.

Another 10 attempts scenario I have seen:
IPC Server handler 3 on 60020, call getRow([B@1ec7483, [B@d54a92, null,
1224105427910, -1) from error: java.io.IOException:
Cannot open filename
java.io.IOException: Cannot open filename

Preceded, in concerned regionsserver log, by the line:

2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_-3759213227484579481_226277 from any node: 
java.io.IOException: No live nodes contain current block

If I look for this block in the hadoop master log I can find

2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask to delete  [...] blk_-3759213227484579481_226277 [...]
(many more blocks)

about 16 min before.
In both cases the regionserver fails to serve the concerned region until I
restart hbase (not hadoop).

I have no clue to know if such a failure is temporary (how long) or I really
need to restart. But I noticed that the failure doesn't recover in the next
3-4 hours.

One last question by the way:
Why the replication factor of my hbase files in dfs is 3, when my hadoop
cluster is configured to keep only 2 copies ?
Is it because the default (hadoop-default.xml) config file of the hadoop
client, which is embedded in hbase distrib overrides the cluster
configuration for the mapfiles created ? 
Is that a good configuration scheme, or is it preferable to allow the hbase
hadoop client to load the hadoop-site.xml file I have set for the running
instance of hadoop server, adding the hadoop conf directory in the hbase
and therefore having the same configuration in client than in server ?

Have a nice day.
Thank you for your advises.

-- Jean-Adrien

Cluster setup:
4 regionsservers / datanodes
1 is master / namenode as well.
Total size of hdfs: 81.98 GB (replication factor 3)
fsck -> healthy
hadoop: 0.18.1
hbase: 0.18.0 (jar of hadoop replaced with 0.18.1)
1Gb ram per node

View this message in context:
Sent from the HBase User mailing list archive at Nabble.com.

View raw message