hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9150) Unnecessary DNS resolution attempts for logical URIs
Date Mon, 17 Dec 2012 18:58:13 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HADOOP-9150:

    Attachment: tracing-resolver.tgz

To diagnose this, I wrote a wrapper implementation of the NameService SPI which logs all resolutions.
Attached is the source for the tracing implementation along with a log I captured on a test
cluster. Here you can see a DNS lookup coming from the path canonicalization code:

java.lang.Exception: looking up ha-nn-uri
        at MyNameservice.lookupAllHostAddr(MyNameservice.java:11)
        at org.apache.hadoop.security.SecurityUtil$StandardHostResolver.getByName(SecurityUtil.java:538)
        at org.apache.hadoop.security.SecurityUtil.getByName(SecurityUtil.java:526)
        at org.apache.hadoop.net.NetUtils.canonicalizeHost(NetUtils.java:283)
        at org.apache.hadoop.net.NetUtils.getCanonicalUri(NetUtils.java:255)
        at org.apache.hadoop.fs.FileSystem.getCanonicalUri(FileSystem.java:214)
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:524)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:401)
> Unnecessary DNS resolution attempts for logical URIs
> ----------------------------------------------------
>                 Key: HADOOP-9150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9150
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: log.txt, tracing-resolver.tgz
> In the FileSystem code, we accidentally try to DNS-resolve the logical name before it
is converted to an actual domain name. In some DNS setups, this can cause a big slowdown -
eg in one misconfigured cluster we saw a 2-3x drop in terasort throughput, since every task
wasted a lot of time waiting for slow "not found" responses from DNS.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message