hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rural Hunter <ruralhun...@gmail.com>
Subject Re: Region server not accept connections intermittently
Date Wed, 09 Jul 2014 01:58:17 GMT
No. I used the standard log4j file and there is not any network problem 
from the client. I checked the web admin ui and the master still take 
the slave as working. Just the request count is very small(about 10 
while others are several hundreds). I sshed on the slave server and I 
can see the 60020 is open by netstat command. But I am not able to 
telnet the port even on the server itself. It just timed out. This 
situation is same as the client from other servers. After it recovered 
automatically, I can telnet to the 60020 port on both the slave server 
and other servers.

This is my server configuration: http://pastebin.com/Ks4cCiaE

Client configuration:
         myConf.set("hbase.zookeeper.quorum", hbaseQuorum);
         myConf.set("hbase.client.retries.number", "3");
         myConf.set("hbase.client.pause", "1000");
         myConf.set("hbase.client.max.perserver.tasks", "10");
         myConf.set("hbase.client.max.perregion.tasks", "10");
         myConf.set("hbase.client.ipc.pool.size", "5");
         myConf.set("zookeeper.recovery.retry", "1");

The error of the client:
Exception in thread "main" 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=3, exceptions:
Mon Jul 07 19:10:35 CST 2014, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@69eb9518, 
org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout 
while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=slave2/192.168.2.88:60020]
Mon Jul 07 19:10:58 CST 2014, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@69eb9518, 
org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout 
while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=slave2/192.168.2.88:60020]
Mon Jul 07 19:11:23 CST 2014, 
org.apache.hadoop.hbase.client.RpcRetryingCaller@69eb9518, 
org.apache.hadoop.net.ConnectTimeoutException: 20000 millis timeout 
while waiting for channel to be ready for connect. ch : 
java.nio.channels.SocketChannel[connection-pending 
remote=slave2/192.168.2.88:60020]

     at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
     at org.apache.hadoop.hbase.client.HTable.delete(HTable.java:831)

于 2014/7/9 1:02, Esteban Gutierrez 写道:
> Hello Rural,
>
> It doesn't seem to be a problem from the region server from what I can
> tell. The RS is not showing in the logs any message about a long pause
> (unless you have a non standard log4j.properties file) and also if the RS
> was in a very long pause due GC or any other issue, then the master should
> have considered this region server as dead and from the logs doesn't look
> like that happened. Have you double checked from the client side for any
> connectivity issue to the RS? can you pastebin the client and the HBase
> cluster confs?
>
> cheers,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
>
> On Tue, Jul 8, 2014 at 2:14 AM, Rural Hunter <ruralhunter@gmail.com> wrote:
>
>> OK, I will try to do that when it happens again. Thanks.
>>
>> 于 2014/7/8 17:06, Ted Yu 写道:
>>
>>   Next time this happens, can you take jstack of the region server and
>>> pastebin it ?
>>>
>>> Thanks
>>>
>>>
>>>


Mime
View raw message