hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9150) Unnecessary DNS resolution attempts for logical URIs
Date Mon, 17 Dec 2012 18:58:13 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HADOOP-9150:
--------------------------------

    Attachment: tracing-resolver.tgz
                log.txt

To diagnose this, I wrote a wrapper implementation of the NameService SPI which logs all resolutions.
Attached is the source for the tracing implementation along with a log I captured on a test
cluster. Here you can see a DNS lookup coming from the path canonicalization code:

{code}
java.lang.Exception: looking up ha-nn-uri
        at MyNameservice.lookupAllHostAddr(MyNameservice.java:11)
...
        at org.apache.hadoop.security.SecurityUtil$StandardHostResolver.getByName(SecurityUtil.java:538)
        at org.apache.hadoop.security.SecurityUtil.getByName(SecurityUtil.java:526)
        at org.apache.hadoop.net.NetUtils.canonicalizeHost(NetUtils.java:283)
        at org.apache.hadoop.net.NetUtils.getCanonicalUri(NetUtils.java:255)
        at org.apache.hadoop.fs.FileSystem.getCanonicalUri(FileSystem.java:214)
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:524)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:401)
...
{code}
                
> Unnecessary DNS resolution attempts for logical URIs
> ----------------------------------------------------
>
>                 Key: HADOOP-9150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9150
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: log.txt, tracing-resolver.tgz
>
>
> In the FileSystem code, we accidentally try to DNS-resolve the logical name before it
is converted to an actual domain name. In some DNS setups, this can cause a big slowdown -
eg in one misconfigured cluster we saw a 2-3x drop in terasort throughput, since every task
wasted a lot of time waiting for slow "not found" responses from DNS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message