hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sam rash (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6762) exception while doing RPC I/O closes channel
Date Thu, 03 Jun 2010 15:09:58 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875140#action_12875140

sam rash commented on HADOOP-6762:

so you mean add Thread.currentThread.interrupt() above markClosed(e) ?  I don't think this
fixes the underlying problem.

The problem is if a channel is doing a wait for IO and an interrupt comes in, you get a ClosedByInterrupt


from the doc, this closes the channel and sets the interrupt status (ie interrupt() shouldn't
have an effect--means my test doesn't repro the same thing I saw).
What I then saw was that other RPC instances using the same channel would get ChannelClosedException.
 The only way to avoid this in the FileSystem case was to move the thread that uses the channel
to its own so the lease checker won't interrupt it.

I'll play with the test case and see why your change makes it pass when it doesn't seem like
it can't fix the underlying problem

> exception while doing RPC I/O closes channel
> --------------------------------------------
>                 Key: HADOOP-6762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6762
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: sam rash
>            Assignee: sam rash
>         Attachments: hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt, hadoop-6762-4.txt
> If a single process creates two unique fileSystems to the same NN using FileSystem.newInstance(),
and one of them issues a close(), the leasechecker thread is interrupted.  This interrupt
races with the rpc namenode.renew() and can cause a ClosedByInterruptException.  This closes
the underlying channel and the other filesystem, sharing the connection will get errors.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message