Hi YuMing, :)
yes. several iterations of jstack on the problem regionserver could help
identify the problem
Rural,
you probably hit hbase11277(and probably YuMin as well) - the reader 14
loops again and again in
below stack(high cpu usage) and listener 12 is blocked and cannot
accept new connections.
1. Thread 12 (RpcServer.listener,port=60020):
2. State: BLOCKED
3. Blocked count: 123264191
4. Waited count: 0
5. Blocked on
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader@77f87716
6. Blocked by 14 (RpcServer.reader=1,port=60020)
7. Stack:
8.
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.registerChannel(RpcServer.java:598)
9.
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:755)
10.
org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:673)
11. Thread 24 (RpcServer.responder):
1. Thread 14 (RpcServer.reader=1,port=60020):
2. State: RUNNABLE
3. Blocked count: 12510492
4. Waited count: 12826560
5. Stack:
6. sun.nio.ch.FileDispatcher.read0(Native Method)
7. sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
8. sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
9. sun.nio.ch.IOUtil.read(IOUtil.java:224)
10. sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
11.
org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2438)
12.
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2404)
13.
org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1498)
14.
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:780)
15.
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:568)
16.
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:543)
17.
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
18.
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
19. java.lang.Thread.run(Thread.java:701)
20. Thread 13 (RpcServer.reader=0,port=60020):
21.
1. 2014-07-10 14:13:49,614 WARN [RpcServer.reader=7,port=60020]
ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
2. java.io.IOException: Connection reset by peer
3. at sun.nio.ch.FileDispatcher.read0(Native Method)
4. at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
5. at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
6. at sun.nio.ch.IOUtil.read(IOUtil.java:224)
7. at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
8. at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2404)
9. at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1425)
10. at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:780)
11. at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:568)
12. at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:543)
13. at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
14. at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
15. at java.lang.Thread.run(Thread.java:701)
On Mon, Jul 14, 2014 at 9:24 AM, Rural Hunter <ruralhunter@gmail.com> wrote:
> Yes. But you may want to check if there are many connections in SYN_RECV
> state when the problem happens.
>
>
> δΊ 2014/7/14 4:18, vito ει:
>
>> Hi Rural ,
>>
>>
>> Do you mean the following action you have taken? Thanks a lot.
>>
>> "Anyway, I just changed these kernel settings:
>> net.core.somaxconn=1024 (original 128)
>> net.ipv4.tcp_synack_retries=2 (original 5) "
>>
>>
>>
>> --
>> View this message in context: http://apache-hbase.679495.n3.
>> nabble.com/hbase-region-servers-refuse-connection-tp4061278p4061293.html
>> Sent from the HBase User mailing list archive at Nabble.com.
>> .
>>
>>
>
|