hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11552) Allow handoff on the server side for RPC requests
Date Thu, 02 Apr 2015 18:24:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393093#comment-14393093

Siddharth Seth commented on HADOOP-11552:

I'm interested in getting this patch into a released version of Hadoop. Having it in a released
version does make it easier to consume for downstream projects; and I do intend to use this
feature in Tez - and that can serve as another testbed. Was hoping to get this into 2.7, but
it's too late for that. Will change the target version to 2.8 - which gives more breathing
room to have it reviewed, and tried out in components within Hadoop.

There isn't that much work in the RPC layer itself. Follow up patches like the shared thread
pool will be more disruptive. When this is used by YARN / HDFS - those patches are likely
to be more involved, and a larger change set. I can create jiras for some of the YARN tasks,
and would request folks in HDFS to create relevant jiras there.

This could absolutely be done in a branch. If this particular patch is considered 'safe' -
it'd be good to get it into 2.8 even if the rest of the work to use it in sub-components isn't

HADOOP-10300 is related, and this patch borrows elements from there - like I mentioned in
my first comment. If I'm not mistaken, 10300 doesn't allow for a return value. Daryn could
correct me here if I've understood that incorrectly.

Multiplexing UGIs over a single connection - that's TBD right ? We still use distinct connections
per UGI if I'm not mistaken. Don't think the patch affects this path. Are there plans to support
multiplexing responses on a connection - i.e. allow a smaller response through, even if the
responder isn't done with a previous response on the same connection ? 

> Allow handoff on the server side for RPC requests
> -------------------------------------------------
>                 Key: HADOOP-11552
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11552
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ipc
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: HADOOP-11552.1.wip.txt, HADOOP-11552.2.txt, HADOOP-11552.3.txt,
HADOOP-11552.3.txt, HADOOP-11552.4.txt
> An RPC server handler thread is tied up for each incoming RPC request. This isn't ideal,
since this essentially implies that RPC operations should be short lived, and most operations
which could take time end up falling back to a polling mechanism.
> Some use cases where this is useful.
> - YARN submitApplication - which currently submits, followed by a poll to check if the
application is accepted while the submit operation is written out to storage. This can be
collapsed into a single call.
> - YARN allocate - requests and allocations use the same protocol. New allocations are
received via polling.
> The allocate protocol could be split into a request/heartbeat along with a 'awaitResponse'.
The request/heartbeat is sent only when there's a request or on a much longer heartbeat interval.
awaitResponse is always left active with the RM - and returns the moment something is available.
> MapReduce/Tez task to AM communication is another example of this pattern.
> The same pattern of splitting calls can be used for other protocols as well. This should
serve to improve latency, as well as reduce network traffic since the keep-alive heartbeat
can be sent less frequently.
> I believe there's some cases in HDFS as well, where the DN gets told to perform some
operations when they heartbeat into the NN.

This message was sent by Atlassian JIRA

View raw message