hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stan Rosenberg (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
Date Tue, 14 May 2013 15:37:18 GMT
Stan Rosenberg created MAPREDUCE-5247:

             Summary: FileInputFormat should filter files with '._COPYING_' sufix
                 Key: MAPREDUCE-5247
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Stan Rosenberg

FsShell copy/put creates staging files with '._COPYING_' suffix.  These files should be considered
hidden by FileInputFormat.  (A simple fix is to add the following conjunct to the existing
hiddenFilter: !name.endsWith("._COPYING_").)

After upgrading to CDH4.2.0 we got bitten by this. We have a legacy data loader which uses
'hadoop fs -put' to load data into hourly partitions.  We also have intra-hourly jobs which
are scheduled to execute several times per hour using the same hourly partition as input.
 Thus, as the new data is continuously loaded, these staging files (i.e., ._COPYING_) are
breaking our jobs (since when copy/put completes staging files are moved).

As a workaround, we've defined a custom input path filter and loaded it with "mapred.input.pathFilter.class".

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message