mina-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Petrenko (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SSHD-633) Race condition in command execution and SSH_MSG_CHANNEL_SUCCESS
Date Thu, 28 Jan 2016 12:45:39 GMT

    [ https://issues.apache.org/jira/browse/SSHD-633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15121358#comment-15121358
] 

Eugene Petrenko edited comment on SSHD-633 at 1/28/16 12:45 PM:
----------------------------------------------------------------

Yes, of course I do create a new thread from my command implementation. But according to 1.0
sources, there are two actions that are done in parallel:
* the output that I write from my a command running thread I started
* the reply to SSH request via {{org.apache.sshd.common.channel.AbstractChannel#handleRequest}}
method.

Both activities are running in different thread. It may turn out the command thread is executed
*before* {{org.apache.sshd.common.channel.AbstractChannel#handleRequest}} is finished. So
it may start writing command output even before {{org.apache.sshd.common.channel.AbstractChannel#sendResponse}}
is executed.

>From the other hand, SSH channel uses Window to limit a number of packets that are send
within a channel. So, this means, by chance, {{#sendResponse}} message can stuck in a queue.
In my case the implementation of JSCH client did nothing before server replies on the request
message with either success or failure. Here we have a race condition with queue order. If
{{#sendResponse}} is executed fast enough, the message is delivered and client works. If we
are unlucky with OS thread scheduling, we may fill up the window with command output packets,
not the response. 

I agree, and I believe I wrote it, I did an incorrect test. But this test simply shows the
possible thread scheduling order that leads JSCH client to stuck. At least the was helpful
to me to reproduce similar error I saw in production. 

I know about NIO_WORKERS and MAX_PACKET_SIZE settings. I updated the beforehand. Unfortunately,
it did not workout. I also realised the faster command implementation works the higher probability
of JSCH issue is. According to the protocol, SSH client may not request the server to replay
on request command. In that case there is no deadlock possible.


To fix the issue, I added additional synchronisation in my code. I created an inheritor of
ChannelSession as follows
{code}public class SSHChannelSessionEx extends ChannelSession {
  private final ReentrantLock myProcessSessionLock = new ReentrantLock();

  @Override
  public void handleRequest(Buffer buffer) throws IOException {
    myProcessSessionLock.lock();
    try {
      super.handleRequest(buffer);
    } finally {
      myProcessSessionLock.unlock();
    }
  }

  public void waitBeforeCommandStartIsAllowed() throws InterruptedException {
    if (myProcessSessionLock.isHeldByCurrentThread()) {
      throw new Error("Should not be called from Command#start method!");
    }

    //make sure it is not locked and wait for it to continue
    myProcessSessionLock.lock();
    myProcessSessionLock.unlock();
  }
}
{code}

Next, I implemented {{ChannelSessionAware}} in my command implementation. Next I added a call
to a session {{waitBeforeCommandStartIsAllowed()}} method from command running thread BEFORE
any output is written. 

The possible fix in the library could be to make sure {{org.apache.sshd.common.channel.AbstractChannel#sendResponse}}
is called before a command's {{#start}} method is called.  



was (Author: jonnyzzz):
Yes, of course I do create a new thread from my command implementation. But according to 1.0
sources, there are two actions that are done in parallel:
* the output that I write from my a command running thread I started
* the reply to SSH request via {{jetbrains.ssh.server.SSHChannelSessionEx#handleRequest}}
method.

Both activities are running in different thread. It may turn out the command thread is executed
*before* {{jetbrains.ssh.server.SSHChannelSessionEx#handleRequest}} is finished. So it may
start writing command output even before {{jetbrains.ssh.server.SSHChannelSessionEx#sendResponse}}
is executed.

>From the other hand, SSH channel uses Window to limit a number of packets that are send
within a channel. So, this means, by chance, {{#sendResponse}} message can stuck in a queue.
In my case the implementation of JSCH client did nothing before server replies on the request
message with either success or failure. Here we have a race condition with queue order. If
{{#sendResponse}} is executed fast enough, the message is delivered and client works. If we
are unlucky with OS thread scheduling, we may fill up the window with command output packets,
not the response. 

I agree, and I believe I wrote it, I did an incorrect test. But this test simply shows the
possible thread scheduling order that leads JSCH client to stuck. At least the was helpful
to me to reproduce similar error I saw in production. 

I know about NIO_WORKERS and MAX_PACKET_SIZE settings. I updated the beforehand. Unfortunately,
it did not workout. I also realised the faster command implementation works the higher probability
of JSCH issue is. According to the protocol, SSH client may not request the server to replay
on request command. In that case there is no deadlock possible.


To fix the issue, I added additional synchronisation in my code. I created an inheritor of
ChannelSession as follows
{code}public class SSHChannelSessionEx extends ChannelSession {
  private final ReentrantLock myProcessSessionLock = new ReentrantLock();

  @Override
  public void handleRequest(Buffer buffer) throws IOException {
    myProcessSessionLock.lock();
    try {
      super.handleRequest(buffer);
    } finally {
      myProcessSessionLock.unlock();
    }
  }

  public void waitBeforeCommandStartIsAllowed() throws InterruptedException {
    if (myProcessSessionLock.isHeldByCurrentThread()) {
      throw new Error("Should not be called from Command#start method!");
    }

    //make sure it is not locked and wait for it to continue
    myProcessSessionLock.lock();
    myProcessSessionLock.unlock();
  }
}
{code}

Next, I implemented {{ChannelSessionAware}} in my command implementation. Next I added a call
to a session {{waitBeforeCommandStartIsAllowed()}} method from command running thread BEFORE
any output is written. 

The possible fix in the library could be to make sure {{jetbrains.ssh.server.SSHChannelSessionEx#sendResponse}}
is called before a command's {{#start}} method is called.  


> Race condition in command execution and SSH_MSG_CHANNEL_SUCCESS
> ---------------------------------------------------------------
>
>                 Key: SSHD-633
>                 URL: https://issues.apache.org/jira/browse/SSHD-633
>             Project: MINA SSHD
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Eugene Petrenko
>            Priority: Critical
>
> We use the library for production. From time to time we see timeout exceptions from clients
that calls our SSH server. It was not clear what is that. Most of the time I saw command suck
reading STDIN. We use JSCH as client here. The problem was reproducing rarely.
> I connect JSCH 1.51 to SSHD server to execute a command
> JSCH client expects to read {{SSH_MSG_CHANNEL_SUCCESS}} or {{SSH_MSG_CHANNEL_FAILURE}}
as response to {{SSH_MSG_CHANNEL_REQUEST}} call to execute command. 
> SSHD implementation calls command's method {{start}} and than posts reply with success
to the channel. It may easily command thread is able to fill send window with DATA messages.
Thus {{SSH_MSG_CHANNEL_SUCCESS}} reply is not delivered. 
> This makes JSCH to wait for the message and fail on timeout. The SSHD server command
is simply suck on reading stdin.
> Here goes the code I managed to reproduce the issue. What I did is to start sending a
huge data block without creating a thread in command's start method. Of course this is a bit
incorrect, but it easily make it fail. 
> The server contains the following command implementation
> {code}
>     final OutputStream out = getOut();
>     out.write(new byte[32 * 1024 * 1024]);
>     out.flush();
>     new Thread(new Runnable() {
>       @Override
>       public void run() {
>         try {
>           getIn().read();
>           onExit(0);
>         } catch (Throwable e) {
>           onExit(1, e.getMessage());
>         }
>       }
>     }).start();
> {code} 
> Jsch client code is follows
> {code}
>         final JSch j = new JSch();
>         final Session session = j.getSession("jonnyzzz", myResource.getHostname(), myResource.getSSHPort());
>         session.setTimeout(60_000);
>         session.connect();
>         final ChannelExec e = (ChannelExec) session.openChannel("exec");
>         e.setCommand("test-buffer-underrun");
>         final InputStream inputStream = new e.getInputStream();
>         e.connect(10_000); //meaningful timeout to reproduce the bug
>         ByteStreams.copy(inputStream, FileUtil.nullOutputStream());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message