mina-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Rathgeb (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SSHD-721) deadlock: all nio workers wait to be woken up
Date Sun, 11 Dec 2016 14:01:00 GMT

    [ https://issues.apache.org/jira/browse/SSHD-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15739780#comment-15739780
] 

Markus Rathgeb commented on SSHD-721:
-------------------------------------

Hi,

thank you for your reply.
It seems to me I was not able to express my problem correctly.
Let's try it again - please be patient.


the code example I posted above has been created for one and only one reason:
To give you an example to reproduce the problem.
The example is a very heavy stress test but I didn't find another way to reproduce it in a
simple manner.

The problem I run into is, that the same number of session could be handled most of the time,
but sometimes the server runs into a dead lock.

Also, ONE client using a forwarded port could run the server into that dead lock (I hope the
naming is okay) situation.
If ONE client causes this scenario, the handling of this client and the handling of all other
clients is not further proceeded.

The non-blocking I/O API is used, so the number of clients does not depend (directly) on the
number of workers. Correct?
If I increase the number of workers multiple stuff could be handled in parallel. Is this a
assumption correct?

If my usage of the sshd code in the example I created to demonstrate the problem is correct
(I really don't know, perhaps I am doing something wrong), a change to the number of NIO workers
sounds to me like changing an unit test so it does not fail anymore.

For me the situation looks like this one:
One HTTP client that connects to a WebServer server is able to drive the server in a situation,
so it does not talk to any client anymore. I think we agree this is a bug in the server code.
Don't we?

For me it would be okay, if no SSH session could be created anymore, because no new one could
be handled.
But it looks like also all existing ones are not handled anymore.
Also there is no real escape but dropping the socket connection manually.

To summarize:
* ports are forwarded by sshd and that ones are used by some clients
* the same number of clients could be handled in parallel most of the time
* one client could use one forwarded port in way, that results in a situation that all workers
wait for something
* the one client that communication caused that situation is not served anymore
* no other client could use any forwarded anymore
* the server could not be used by any client anymore

> deadlock: all nio workers wait to be woken up
> ---------------------------------------------
>
>                 Key: SSHD-721
>                 URL: https://issues.apache.org/jira/browse/SSHD-721
>             Project: MINA SSHD
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 1.4.0
>            Reporter: Markus Rathgeb
>
> I am using sshd-core for a server machine (S) that accepts incoming connections and port
forwarding requests.
> There are client machines (C) that run servers that should be accessible by a tunnel
to the server.
> On the client machines (C) also an implementation using sshd-core is running that establish
the connection to the server (S) and initiate the port forwarding.
> Other clients are using the tunnelled connection to communicate with the servers that
are running on the client machines (C).
> Sometimes I realized that no data is transferred anymore (through the tunnels).
> All the worker reside in the waitFor function and no one wakes them up.
> {noformat}
> "sshd-SshServer[67de991c]-nio2-thread-3" - Thread t@125
>    java.lang.Thread.State: TIMED_WAITING
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <132c6b60> (a java.lang.Object)
> 	at org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
> 	at org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
> 	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
> 	at sun.nio.ch.Invoker$2.run(Invoker.java:218)
> 	at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- locked <6d02d7ed> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-nio2-thread-2" - Thread t@124
>    java.lang.Thread.State: TIMED_WAITING
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <7e9f4eff> (a java.lang.Object)
> 	at org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
> 	at org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
> 	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
> 	at sun.nio.ch.Invoker$2.run(Invoker.java:218)
> 	at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- locked <35fbf3e8> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-nio2-thread-1" - Thread t@122
>    java.lang.Thread.State: TIMED_WAITING
> 	at java.lang.Object.wait(Native Method)
> 	- waiting on <49ce93a9> (a java.lang.Object)
> 	at org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)
> 	at org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)
> 	at org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
> 	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
> 	at sun.nio.ch.Invoker.invokeDirect(Invoker.java:157)
> 	at sun.nio.ch.UnixAsynchronousSocketChannelImpl.implRead(UnixAsynchronousSocketChannelImpl.java:553)
> 	at sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:276)
> 	at sun.nio.ch.AsynchronousSocketChannelImpl.read(AsynchronousSocketChannelImpl.java:297)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.doReadCycle(Nio2Session.java:304)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.doReadCycle(Nio2Session.java:249)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:243)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:239)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:235)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:231)
> 	at org.apache.sshd.common.io.nio2.Nio2Session.startReading(Nio2Session.java:227)
> 	at org.apache.sshd.common.io.nio2.Nio2Acceptor$AcceptCompletionHandler.onCompleted(Nio2Acceptor.java:178)
> 	at org.apache.sshd.common.io.nio2.Nio2Acceptor$AcceptCompletionHandler.onCompleted(Nio2Acceptor.java:156)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)
> 	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
> 	at sun.nio.ch.Invoker$2.run(Invoker.java:218)
> 	at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> 	- locked <7d1c59e6> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "sshd-SshServer[67de991c]-timer-thread-1" - Thread t@105
>    java.lang.Thread.State: TIMED_WAITING
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for <655c080c> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809)
> 	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
> - None
> {noformat}
> To eliminate some other code that could trigger that error I created a "minimal" example
-- a simple test application -- that could be used to demonstrate the hang (for me it is reproducible
using that code).
> Please have a look at https://github.com/maggu2810/sshd-deadlock/tree/first-report where
you could also find a readme with a short description about the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message