hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray" <jl...@streamy.com>
Subject RE: Regionserver fails to serve region
Date Fri, 17 Oct 2008 15:30:06 GMT
Jean-Adrien,

Did you see my reply to your previous email?

I think your machines are underpowered for your current setup and it's
creating all kinds of problems.  If you have swapping going on in a
regionserver/datanode, that must be addressed because it usually leads to
odd behavior in hdfs, timeouts, starvation, etc...

Decrease your allotted heap sizes to fit within available memory, or add
more memory.

JG

-----Original Message-----
From: Jean-Adrien [mailto:adv1@jeanjean.ch] 
Sent: Friday, October 17, 2008 1:02 AM
To: hbase-user@hadoop.apache.org
Subject: Regionserver fails to serve region


Hello again.
This is my last message for today

I have often an exception in my HBase client. A regionserver fails to serve
a region when the client get a row on the HBase cluster.

org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact
region server 192.168.1.15:60020 for region
table-0.3,:testrow79063200,1223872616091, row ':testrow22102600', but failed
after 10 attempts.

The attempts of above can be:
1.
java.io.IOException: java.io.IOException: Premeture EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
2-10
java.io.IOException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)

After what. Every time the client try to reach the same region the attemps
1-10 are
java.io.IOException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)

In this case, if the client try to reach the same region again, all next 10
attemps are the NPE.

Another 10 attempts scenario I have seen:
1-10:
IPC Server handler 3 on 60020, call getRow([B@1ec7483, [B@d54a92, null,
1224105427910, -1) from 192.168.1.11:55371: error: java.io.IOException:
Cannot open filename
/hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
java.io.IOException: Cannot open filename
/hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1171)

Preceded, in concerned regionsserver log, by the line:

2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not
obtain block blk_-3759213227484579481_226277 from any node: 
java.io.IOException: No live nodes contain current block

If I look for this block in the hadoop master log I can find

2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask
192.168.1.13:50010 to delete  [...] blk_-3759213227484579481_226277 [...]
(many more blocks)

about 16 min before.
In both cases the regionserver fails to serve the concerned region until I
restart hbase (not hadoop).

I have no clue to know if such a failure is temporary (how long) or I really
need to restart. But I noticed that the failure doesn't recover in the next
3-4 hours.

One last question by the way:
Why the replication factor of my hbase files in dfs is 3, when my hadoop
cluster is configured to keep only 2 copies ?
Is it because the default (hadoop-default.xml) config file of the hadoop
client, which is embedded in hbase distrib overrides the cluster
configuration for the mapfiles created ? 
Is that a good configuration scheme, or is it preferable to allow the hbase
hadoop client to load the hadoop-site.xml file I have set for the running
instance of hadoop server, adding the hadoop conf directory in the hbase
classpath,
and therefore having the same configuration in client than in server ?

Have a nice day.
Thank you for your advises.

-- Jean-Adrien

Cluster setup:
4 regionsservers / datanodes
1 is master / namenode as well.
java-6-sun
Total size of hdfs: 81.98 GB (replication factor 3)
fsck -> healthy
hadoop: 0.18.1
hbase: 0.18.0 (jar of hadoop replaced with 0.18.1)
1Gb ram per node




-- 
View this message in context:
http://www.nabble.com/Regionserver-fails-to-serve-region-tp20028553p20028553
.html
Sent from the HBase User mailing list archive at Nabble.com.


Mime
View raw message