hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-6889) Make RPC to have an option to timeout
Date Fri, 29 Jul 2011 21:57:09 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-6889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HADOOP-6889:
---------------------------------

    Attachment: HADOOP-6889-for20.2.patch

Thanks for your review and comments Matt. {quote}1. I see a single new test case, TestIPC.testIpcTimeout(),
that tests the lowest-level timeout functionality, between a client and a TestServer server.
However, I do not see any test cases that check whether the integration of that timeout functionality
with, eg, the InterDatanodeProtocol works as expected. (The mod to TestInterDatanodeProtocol
merely adapts to the change, it does not test the change.) Similarly, no test of timeout in
the context of DFSClient with a MiniDFSCluster. Granted the original patch to trunk doesn't
test these either. But do you feel confident in the patch without such additional tests, and
why?{quote}
I'm uploading a new patch with the added tests on behalf of John George.

{quote}2. Are the variances between the trunk and v20 patches due only to code tree divergence,
or are there changes added to the v20 patch that are not in v23 and perhaps should be? Thanks.{quote}
John told me the variances are indeed only because of the tree divergence. 

> Make RPC to have an option to timeout
> -------------------------------------
>
>                 Key: HADOOP-6889
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6889
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: John George
>             Fix For: 0.20-append, 0.20.205.0, 0.22.0
>
>         Attachments: HADOOP-6889-for20.2.patch, HADOOP-6889-for20.patch, HADOOP-6889.patch,
ipcTimeout.patch, ipcTimeout1.patch, ipcTimeout2.patch
>
>
> Currently Hadoop RPC does not timeout when the RPC server is alive. What it currently
does is that a RPC client sends a ping to the server whenever a socket timeout happens. If
the server is still alive, it continues to wait instead of throwing a SocketTimeoutException.
This is to avoid a client to retry when a server is busy and thus making the server even busier.
This works great if the RPC server is NameNode.
> But Hadoop RPC is also used for some of client to DataNode communications, for example,
for getting a replica's length. When a client comes across a problematic DataNode, it gets
stuck and can not switch to a different DataNode. In this case, it would be better that the
client receives a timeout exception.
> I plan to add a new configuration ipc.client.max.pings that specifies the max number
of pings that a client could try. If a response can not be received after the specified max
number of pings, a SocketTimeoutException is thrown. If this configuration property is not
set, a client maintains the current semantics, waiting forever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message