hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ray Chiang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6685) LocalDistributedCacheManager can have overlapping filenames
Date Tue, 26 Apr 2016 18:47:12 GMT
Ray Chiang created MAPREDUCE-6685:

             Summary: LocalDistributedCacheManager can have overlapping filenames
                 Key: MAPREDUCE-6685
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6685
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 3.0.0
            Reporter: Ray Chiang
            Assignee: Ray Chiang

LocalDistributedCacheManager has this setup:

bq. AtomicLong uniqueNumberGenerator = new AtomicLong(System.currentTimeMillis());

to create this temporary filename:

bq. new FSDownload(localFSFileContext, ugi, conf, new Path(destPath,  Long.toString(uniqueNumberGenerator.incrementAndGet())),

when using LocalJobRunner.  When two or more start on the same machine, then it's possible
to end up having the same timestamp or a large enough overlap that two successive timestamps
may not be sufficiently far apart.

Given the assumptions:

1) Assume timestamp is the same. Then the most common starting random seed will be the same.
2) Process ID will very likely be unique, but will likely be close in value.
3) Thread ID is not guaranteed to be unique.

A unique ID based on PID as a seed (in addition to the timestamp) should be a better unique
identifier for temporary filenames.

This message was sent by Atlassian JIRA

View raw message