hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Adrien <a...@jeanjean.ch>
Subject Regionserver fails to serve region
Date Fri, 17 Oct 2008 08:01:40 GMT

Hello again.
This is my last message for today

I have often an exception in my HBase client. A regionserver fails to serve
a region when the client get a row on the HBase cluster.

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server for region
table-0.3,:testrow79063200,1223872616091, row ':testrow22102600', but failed
after 10 attempts.

The attempts of above can be:
java.io.IOException: java.io.IOException: Premeture EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
java.io.IOException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)

After what. Every time the client try to reach the same region the attemps
1-10 are
java.io.IOException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)

In this case, if the client try to reach the same region again, all next 10
attemps are the NPE.

Another 10 attempts scenario I have seen:
IPC Server handler 3 on 60020, call getRow([B@1ec7483, [B@d54a92, null,
1224105427910, -1) from error: java.io.IOException:
Cannot open filename
java.io.IOException: Cannot open filename

Preceded, in concerned regionsserver log, by the line:

2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_-3759213227484579481_226277 from any node: 
java.io.IOException: No live nodes contain current block

If I look for this block in the hadoop master log I can find

2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask to delete  [...] blk_-3759213227484579481_226277 [...]
(many more blocks)

about 16 min before.
In both cases the regionserver fails to serve the concerned region until I
restart hbase (not hadoop).

I have no clue to know if such a failure is temporary (how long) or I really
need to restart. But I noticed that the failure doesn't recover in the next
3-4 hours.

One last question by the way:
Why the replication factor of my hbase files in dfs is 3, when my hadoop
cluster is configured to keep only 2 copies ?
Is it because the default (hadoop-default.xml) config file of the hadoop
client, which is embedded in hbase distrib overrides the cluster
configuration for the mapfiles created ? 
Is that a good configuration scheme, or is it preferable to allow the hbase
hadoop client to load the hadoop-site.xml file I have set for the running
instance of hadoop server, adding the hadoop conf directory in the hbase
and therefore having the same configuration in client than in server ?

Have a nice day.
Thank you for your advises.

-- Jean-Adrien

Cluster setup:
4 regionsservers / datanodes
1 is master / namenode as well.
Total size of hdfs: 81.98 GB (replication factor 3)
fsck -> healthy
hadoop: 0.18.1
hbase: 0.18.0 (jar of hadoop replaced with 0.18.1)
1Gb ram per node

View this message in context: http://www.nabble.com/Regionserver-fails-to-serve-region-tp20028553p20028553.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message