hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-6762) exception while doing RPC I/O closes channel
Date Mon, 03 Dec 2012 22:41:59 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HADOOP-6762:
--------------------------------

    Attachment: hadoop-6762.txt

Here's an updated patch against trunk.

I ran all of the unit tests in the ipc package locally and they passed. I also tried the new
unit tests _without_ the patch, and they failed as expected.

Given that there was a deadlock found in an early rev of this patch, I also ran all of the
IPC unit tests under jcarder to look for lock inversions and it found none.

I ran the RPCCallBenchmark for 30 seconds with and without the patch, with the following results:

With patch:
====== Results ======
Options:
rpcEngine=class org.apache.hadoop.ipc.ProtobufRpcEngine
serverThreads=30
serverReaderThreads=4
clientThreads=30
host=0.0.0.0
port=12345
secondsToRun=30
msgSize=1024
Total calls per second: 24668.0
CPU time per call on client: 58639 ns
CPU time per call on server: 64893 ns


Without patch:
====== Results ======
Options:
rpcEngine=class org.apache.hadoop.ipc.ProtobufRpcEngine
serverThreads=30
serverReaderThreads=4
clientThreads=30
host=0.0.0.0
port=12345
secondsToRun=30
msgSize=1024
Total calls per second: 27881.0
CPU time per call on client: 68079 ns
CPU time per call on server: 62582 ns

As expected, the CPU time on the client was increased and the throughput went down by about
13%, since the RPC calls are now being shuttled between threads on the client side. That's
unfortunate, but given that this fixes an important bug, and given that _client_ side RPC
throughput is rarely a bottleneck in common usage scenarios, I think it is acceptable.

This patch is also nearly identical to a patch that we've shipped in CDH since June 2010,
so I'm fairly confident that the approach is correct.
                
> exception while doing RPC I/O closes channel
> --------------------------------------------
>
>                 Key: HADOOP-6762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6762
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: sam rash
>            Assignee: Todd Lipcon
>            Priority: Critical
>         Attachments: hadoop-6762-10.txt, hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt,
hadoop-6762-4.txt, hadoop-6762-6.txt, hadoop-6762-7.txt, hadoop-6762-8.txt, hadoop-6762-9.txt,
HADOOP-6762.patch, hadoop-6762.txt, hadoop-6762.txt, hadoop-6762.txt
>
>
> If a single process creates two unique fileSystems to the same NN using FileSystem.newInstance(),
and one of them issues a close(), the leasechecker thread is interrupted.  This interrupt
races with the rpc namenode.renew() and can cause a ClosedByInterruptException.  This closes
the underlying channel and the other filesystem, sharing the connection will get errors.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message