hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10597) Evaluate if we can have RPC client back off when server is under heavy load
Date Fri, 01 Aug 2014 05:58:39 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081950#comment-14081950

Ming Ma commented on HADOOP-10597:

Thanks, Jing and Arpit.

1. In the current implementation, RPC server only throws RetriableException back to client
when RPC queue is full, or more specifically RPC queue is full for the RPC user with HADOOP-9460.
So before RPC queue is full, there should be no difference. It might be interesting to verify
"large number of connections" scenario. The blocking approach could hold up lots of TCP connections
and thus other users' request can't connect.

2. The value of server defined backoff policy. So far I don't have any use case that requires
server to specify backoff policy. So it is possible all we need is to have RPC server throws
RetriableException without backoff policy. I put it there for extensibility and based on Steve's
suggestion. This might still be useful later. What if the client doesn't honor the policy?
In a controlled environment, we can assume a single client will use hadoop RPC client which
enforce the policy; if we have many clients, then the backoff policy component in RPC server
such as LinearClientBackoffPolicy can keep state and can adjust the backoff policy parameters.

3. How it is related to HADOOP-9640. HADOOP-9640 is quite useful. client backoff can be complementary
to that. FairQueue currently is blocking; if a given RPC request's enqueue to FairQueue is
blocked due to FairQueue policy, it will hold up TCP connection and the reader threads. If
we use FairQueue together with client backoff, requests from some heavy load application won't
hold up TCP connection and the reader threads; thus allow other applications' request to be
processed more quickly. Some evaluation to compare HADOOP-9640 with "HADOOP-9640 + client
backoff" might be useful. I will follow up with Chris Li on that.

Is there any other scenarios? For example, we can have RPC rejects requests based on user
id, method name or machine ip for some operational situations. Granted, these can also be
handled at the higher layer.

> Evaluate if we can have RPC client back off when server is under heavy load
> ---------------------------------------------------------------------------
>                 Key: HADOOP-10597
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10597
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HADOOP-10597-2.patch, HADOOP-10597.patch, RPCClientBackoffDesignAndEvaluation.pdf
> Currently if an application hits NN too hard, RPC requests be in blocking state, assuming
OS connection doesn't run out. Alternatively RPC or NN can throw some well defined exception
back to the client based on certain policies when it is under heavy load; client will understand
such exception and do exponential back off, as another implementation of RetryInvocationHandler.

This message was sent by Atlassian JIRA

View raw message