hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-6762) exception while doing RPC I/O closes channel
Date Mon, 03 Dec 2012 23:03:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509297#comment-13509297

Todd Lipcon commented on HADOOP-6762:

Also figured I'd write up a short summary of this, since the above discussion is long and
somewhat hard to follow after 2.5 years and ~15 attachments :)

The issue at hand is what happens when an IPC caller thread (i.e the user thread who is making
an IPC call, for example to the NN) is interrupted while in the process of writing the call
to the wire. Java NIO's semantics are that a ClosedByInterruptException is thrown on the blocked
thread, _and also that the underlying channel is closed_. In the context of IPC, this meant
that the caller thread would receive a ClosedByInterruptException, and that any other threads
which were sharing the same IPC socket would then receive ClosedChannelExceptions, even though
those other threads were never meant to be interrupted.

The solution is to change the call-sending code such that the actual write() call happens
on a new thread, created by the {{SEND_PARAMS_EXECUTOR}} in the patch. Since the user code
has no reference to this thread, it won't ever get interrupted, even if someone interrupts
the user thread making the call. So, the user thread will receive an InterruptedException,
but any other threads using the same socket continue to run unaffected.
> exception while doing RPC I/O closes channel
> --------------------------------------------
>                 Key: HADOOP-6762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6762
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: sam rash
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hadoop-6762-10.txt, hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt,
hadoop-6762-4.txt, hadoop-6762-6.txt, hadoop-6762-7.txt, hadoop-6762-8.txt, hadoop-6762-9.txt,
HADOOP-6762.patch, hadoop-6762.txt, hadoop-6762.txt, hadoop-6762.txt
> If a single process creates two unique fileSystems to the same NN using FileSystem.newInstance(),
and one of them issues a close(), the leasechecker thread is interrupted.  This interrupt
races with the rpc namenode.renew() and can cause a ClosedByInterruptException.  This closes
the underlying channel and the other filesystem, sharing the connection will get errors.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message