mina-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hugo Arès (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SSHD-348) Some SSH threads get blocked in Object.wait() method forever
Date Mon, 16 Feb 2015 21:02:12 GMT

    [ https://issues.apache.org/jira/browse/SSHD-348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323288#comment-14323288

Hugo Arès commented on SSHD-348:

I uploaded a file called debugLogs.txt which contains logs about 3 stuck threads. I isolated
logs related to the channel and recipient that is stuck in waitForSpace and I also filtered
out the log "Send SSH_MSG_CHANNEL_DATA on channel 1" since it is flooding and I am not convinced
it is helping to debug this issue.

In 3 examples, we see the 2 following lines but never the one saying that the channel is closed:
DEBUG org.apache.sshd.server.channel.ChannelSession : Closing ChannelSession[id=x, recipient=x]
DEBUG org.apache.sshd.server.channel.ChannelSession : Wait 5000 ms for shell to exit cleanly

In 2 examples (1 and 2), there is a stack trace but I do not know if it is related to the
problem so I included it.

> Some SSH threads get blocked in Object.wait() method forever
> ------------------------------------------------------------
>                 Key: SSHD-348
>                 URL: https://issues.apache.org/jira/browse/SSHD-348
>             Project: MINA SSHD
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0
>         Environment: Gerrit Code Review 2.9.1 (SSHD 0.12.0)
> Gerrit Code Review 2.9.2 (SSHD 0.13.0)
> Gerrit Code Review 2.9.3 (Downgraded to SSHD 0.9)
>            Reporter: David Ostrovsky
>            Assignee: Guillaume Nodet
>            Priority: Blocker
>             Fix For: 0.14.0
>         Attachments: 0001-Prepare-release-sshd-0.13.0-72f868e26.patch, debugLogs.txt,
diff, threaddump.txt
> This seems to be a regression started from 0.10.1.
> In Gerrit we have SSH commamds that returns immediately and so called stream-events command
which keeps connection open until clients disconnect. Basically for days or weeks. This is
used for example to inform CI system (jenkins) about events in gerrit, like new patch set
upload etc.
> In Gerrit we have configurable "SSH-Stream-Worker" thread pool which is dedicated to
the mentioned stream-events SSH command. The observed behaviour on latest SSHD release is
that after some time all threads are stuck. They aren't stuck at the same time, but one after
another untill at some time all threads are stuck and Gerrit must be restarted. Usually after
one week. The stack dump of all such stuck thread are the same, see below. If we had a patch
we could apply it to our production Gerrit instance and try if this helps.
> {code}
> "SSH-Stream-Worker-10" cpu=95400.00 [reset 95400.00] ms elapsed=146444.30 [reset 146444.30]
s allocated=5526700000 B (5.15 GB) [reset 5526700000 B (5.15 GB)] defined_classes=0
> io= file i/o: 15622752/0 B, net i/o: 0/0 B, files opened:0, socks opened:0  [reset file
i/o: 15622752/0 B, net i/o: 0/0 B, files opened:0, socks opened:0 ] 
> prio=10 tid=0x00007f54514df800 nid=0x1c71 / 7281  pthread-id=139999281374976 in Object.wait()
 [_thread_blocked (_at_safepoint), stack(0x00007f541f5f6000,0x00007f541f6f7000)] [0x00007f541f6f5000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(J)V(Native Method)
> 	- waiting on <0x00007f553aa530d0> (a org.apache.sshd.common.channel.Window)
> 	at java.lang.Object.wait()V(Object.java:503)
> 	at org.apache.sshd.common.channel.Window.waitForSpace()I(Window.java:148)
> 	- locked <0x00007f553aa530d0> (a org.apache.sshd.common.channel.Window)
> 	at org.apache.sshd.common.channel.ChannelOutputStream.flush()V(ChannelOutputStream.java:116)
> 	- locked <0x00007f553aa55060> (a org.apache.sshd.common.channel.ChannelOutputStream)
> 	at org.apache.sshd.common.channel.ChannelOutputStream.write([BII)V(ChannelOutputStream.java:84)
> 	- locked <0x00007f553aa55060> (a org.apache.sshd.common.channel.ChannelOutputStream)
> 	at sun.nio.cs.StreamEncoder.writeBytes()V(StreamEncoder.java:221)
> 	at sun.nio.cs.StreamEncoder.implFlushBuffer()V(StreamEncoder.java:291)
> 	at sun.nio.cs.StreamEncoder.implFlush()V(StreamEncoder.java:295)
> 	at sun.nio.cs.StreamEncoder.flush()V(StreamEncoder.java:141)
> 	- locked <0x00007f553aa7e258> (a java.io.OutputStreamWriter)
> 	at java.io.OutputStreamWriter.flush()V(OutputStreamWriter.java:229)
> 	at java.io.BufferedWriter.flush()V(BufferedWriter.java:254)
> 	- locked <0x00007f553aa7e258> (a java.io.OutputStreamWriter)
> 	at java.io.PrintWriter.flush()V(PrintWriter.java:320)
> 	- locked <0x00007f553aa7e210> (a java.io.BufferedWriter)
> 	at java.io.PrintWriter.checkError()Z(PrintWriter.java:357)
> 	at com.google.gerrit.sshd.commands.StreamEvents.writeEvents()V(StreamEvents.java:186)
> 	at com.google.gerrit.sshd.commands.StreamEvents.access$100(Lcom/google/gerrit/sshd/commands/StreamEvents;)V(StreamEvents.java:43)
> 	at com.google.gerrit.sshd.commands.StreamEvents$3.run()V(StreamEvents.java:82)
> 	at java.util.concurrent.Executors$RunnableAdapter.call()Ljava/lang/Object;(Executors.java:471)
> 	at java.util.concurrent.FutureTask.run()V(FutureTask.java:262)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Ljava/util/concurrent/ScheduledThreadPoolExecutor$ScheduledFutureTask;)V(ScheduledThreadPoolExecutor.java:178)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run()V(ScheduledThreadPoolExecutor.java:292)
> 	at com.google.gerrit.server.git.WorkQueue$Task.run()V(WorkQueue.java:364)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run()V(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run()V(Thread.java:812)
> {code}

This message was sent by Atlassian JIRA

View raw message