hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9640) RPC Congestion Control
Date Wed, 04 Dec 2013 02:20:37 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838507#comment-13838507

Chris Li commented on HADOOP-9640:

Thanks for the look, Andrew.

bq. Parsing the MapReduce job name out of the DFSClient name is kind of an ugly hack. The
client name also isn't that reliable since it's formed from the client's Configuration, and
more generally anything in the RPC format that isn't a Kerberos token can be faked. Are these
concerns in scope for your proposal?

bq. Tracking by user is also not going to work so well in a HiveServer2 setup where all Hive
queries are run as the hive user. This is a pretty common DB security model, since you need
this for column/row-level security.

This is definitely up for discussion. One way would be to add a new field specifically for
QoS that provides the identity (whether tied to job or user). 

I'm not too familiar with HiveServer2 and what could be done there. Maybe there's some information
that's passed through about the original user?

bq. What's the purpose of separating read and write requests? Write requests take the write
lock, and are thus more "expensive" in that sense, but your example of the listDir of a large
directory is a read operation.

bq. In the "Identify suspects" section, I see that you present three options here. Which one
do you think is best? Seems like you're leaning toward option 3.

bq. Does dropping an RPC result in exponential back-off from the client, a la TCP? Client
backpressure is pretty important to reach a steady state.

The NN-denial-of-service plan (using a multi-level queue) supersedes the rpc congestion control
doc (identifying bad users). 

bq. I didn't see any mention of fair share here, are you planning to adjust suspect thresholds
based on client share?

Clients over-using resources are throttled automatically by being placed into low-priority
queues, bringing them back into reign. Given many users contesting over 100% of the server's
resources, they will all tend to use an equal amount.

Adjusting thresholds at runtime would be a future enhancement.

bq. Any thoughts on how to automatically determine these thresholds? These seem like kind
of annoying parameters to tune.

There are two thresholds to tune:
1. the scheduler thresholds (defaults to even split e.g. with 4 queues: 25% each)
2. the multiplexer's round-robin weights (defaults to log split e.g. 2^3 from queue 0, 2^2
from queue 1, etc)

The defaults work pretty well for us, but different clusters will have different loads. The
scheduler will provide JMX metrics to make it easier to tune.

bq. Maybe admin / superuser commands and service RPCs should be excluded from this feature

Currently a config key (like ipc.8020.history-scheduler.service-users) specifies service users
which are given absolute  high priority, and will always be scheduled into the highest-priority
queue. To completely exclude service RPC calls, one could use the service RPC server.

bq. Do you have any preliminary benchmarks supporting the design? Performance is a pretty
important aspect of this design.

I'll put some more numbers up shortly. Some preliminary results are on page 8 of the [attachment|https://issues.apache.org/jira/secure/attachment/12616864/NN-denial-of-service-updated-plan.pdf]

I should have the code up soon as well.

> RPC Congestion Control
> ----------------------
>                 Key: HADOOP-9640
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9640
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Xiaobo Peng
>              Labels: hdfs, qos, rpc
>         Attachments: NN-denial-of-service-updated-plan.pdf, faircallqueue.patch, rpc-congestion-control-draft-plan.pdf
> Several production Hadoop cluster incidents occurred where the Namenode was overloaded
and failed to be responsive.  This task is to improve the system to detect RPC congestion
early, and to provide good diagnostic information for alerts that identify suspicious jobs/users
so as to restore services quickly.
> Excerpted from the communication of one incident, “The map task of a user was creating
huge number of small files in the user directory. Due to the heavy load on NN, the JT also
was unable to communicate with NN...The cluster became responsive only once the job was killed.”
> Excerpted from the communication of another incident, “Namenode was overloaded by GetBlockLocation
requests (Correction: should be getFileInfo requests. the job had a bug that called getFileInfo
for a nonexistent file in an endless loop). All other requests to namenode were also affected
by this and hence all jobs slowed down. Cluster almost came to a grinding halt…Eventually
killed jobtracker to kill all jobs that are running.”
> Excerpted from HDFS-945, “We've seen defective applications cause havoc on the NameNode,
for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc.”

This message was sent by Atlassian JIRA

View raw message