Zheng Shao created MAPREDUCE-6840:
-------------------------------------
Summary: Distcp to support cutoff time
Key: MAPREDUCE-6840
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6840
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: distcp
Affects Versions: 2.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
Priority: Minor
To ensure consistency in the datasets on HDFS, some projects like file formats on Hive do
HDFS operations in a particular order. For example, if a file format uses an index file,
a new version of the index file will only be written to HDFS after all files mentioned by
the index are written to HDFS.
When we do distcp, it's important to preserve that consistency, so that we don't break those
file formats.
A typical solution for that is to create a HDFS Snapshot beforehand, and only distcp the Snapshot.
That could work well if the user has superuser privilege to make the directory snapshottable.
If not, then it will be beneficial to have a cutoff time for distcp, so that distcp only copy
files modified on/before that cutoff time.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org
|