hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wilfred Spiegelenburg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11252) RPC client write does not time out by default
Date Fri, 28 Nov 2014 07:28:15 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228120#comment-14228120
] 

Wilfred Spiegelenburg commented on HADOOP-11252:
------------------------------------------------

A first version of a patch to set a value for the write timeout. I have used the proposed
write timeout property name as given by [~cmccabe]. I have set it to an arbitrary default
of 5 minutes, which seems reasonable. We still allow anyone to still set it to 0 (no timeout)
if they want it.

The cases described by [~mingma] above YARN-2714, HDFS-4858 and YARN-2578 all have the same
no response to a write cause. Setting this timeout should solve all these issues.
One point I think needs to be checked is the getTimeOut() in Client.java. It uses the ping
interval as a timeout, if ping is not enabled. I think that this should be changed to use
the same timeout as this change introduces. Also changing the time out based on the ping interval
is not really logical. This jira might not be the correct one to introduce that change so
I left it out.

> RPC client write does not time out by default
> ---------------------------------------------
>
>                 Key: HADOOP-11252
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11252
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.5.0
>            Reporter: Wilfred Spiegelenburg
>            Assignee: Wilfred Spiegelenburg
>            Priority: Critical
>         Attachments: HADOOP-11252.patch
>
>
> The RPC client has a default timeout set to 0 when no timeout is passed in. This means
that the network connection created will not timeout when used to write data. The issue has
shown in YARN-2578 and HDFS-4858. Timeouts for writes then fall back to the tcp level retry
(configured via tcp_retries2) and timeouts between the 15-30 minutes. Which is too long for
a default behaviour.
> Using 0 as the default value for timeout is incorrect. We should use a sane value for
the timeout and the "ipc.ping.interval" configuration value is a logical choice for it. The
default behaviour should be changed from 0 to the value read for the ping interval from the
Configuration.
> Fixing it in common makes more sense than finding and changing all other points in the
code that do not pass in a timeout.
> Offending code lines:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java#L488
> and 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/RPC.java#L350



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message