hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9956) RPC listener inefficiently assigns connections to readers
Date Tue, 12 Nov 2013 16:59:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820244#comment-13820244

Kihwal Lee commented on HADOOP-9956:

bq. Closing idle connections might be the only option, if you don't want client to DoS server
trivially, accidental or not, by opening too many idle connections. If an application protocol
cares about idempotence, the application should handle it, i.e., we should fix job client
to avoid submitting duplicate jobs. Otherwise many network issues will cause the same problem.
We can even make it a little more client friendly by respond with an empty RPC frame with
a busy code before closing the connection.

I fully agree. The limitation exists even today and we need to have a way for rpc server to
better protect itself. As suggested above, it will be nice if server can make client cooperate
by sending back something like EBUSY. If done right, this can spread out sharp peaks. Also
capping the number of allowed connections may be necessary.  But this is beyond the scope
of this jira.  [~daryn], would you file a jira for addressing this issue?

As for the patch, the change looks good to me. +1.

> RPC listener inefficiently assigns connections to readers
> ---------------------------------------------------------
>                 Key: HADOOP-9956
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9956
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: ipc
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HADOOP-9956.branch-23.patch, HADOOP-9956.patch, HADOOP-9956.patch
> The socket listener and readers use a complex synchronization to update the reader's
NIO {{Selector}}.  Updating active selectors is not thread-safe so precautions are required.
> However, the current locking choreography results in a serialized distribution of new
connections to the parallel socket readers.  A slower/busier reader can stall the listener
and throttle performance.
> The problem manifests as unexpectedly low cpu utilization by the listener and readers
(~20-30%) under heavy load.  The call queue is shallow when it should be overflowing.

This message was sent by Atlassian JIRA

View raw message