hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rares Vernica (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-2004) IP address vs host name in updating Counter.DATA_LOCAL_MAPS
Date Wed, 11 Aug 2010 17:20:16 GMT
IP address vs host name in updating Counter.DATA_LOCAL_MAPS

                 Key: MAPREDUCE-2004
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2004
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: jobtracker
    Affects Versions: 0.20.2
            Reporter: Rares Vernica
            Priority: Minor


 I set "mapred.task.cache.levels" to 1 so that I have only
data-local-map tasks. Still, by looking the the data-local-maps
counter it seems not all map tasks are local. I checked each map task
to see where it run and what split has been assigned to it and all the
maps were actually processing only local data. (BTW, replication was
set to 1.)

I looked into the JobClient so see what information is there for each
split. For each file, the first n-1 splits have an IP address as
location while the n-th split has a host name as location. The reason
for this is that there is a different code path in deciding the
location for the first n-1 splits versus the n-th split. The maps that
processed the splits where the location was a host name were counted
as data-local-maps while the others were not.

So, regardless of the fact that the JobClient gives IP or host names
for splits the job works fine. The problem is that the data-local-maps
counter does not take this into consideration.


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message