hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suraj Nayak (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
Date Thu, 05 May 2016 00:52:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271724#comment-15271724
] 

Suraj Nayak commented on HADOOP-8065:
-------------------------------------

Thanks [~raviprak] for your review and thoughts.

1. AFAIK Yes, as the checksum will be checked after copy. This will never match as the source
checksum is different for the target checksum after compression. Job fails due to this checksum
mismatch. 
{{-skipcrccheck}} can be used with {{-update}} option where as this feature do not support
{{-update}} option. 
2. Typo :) will correct it.
3. Will change it to. 
{code}LOG.error("Compression class " + compressionCodecClass
          + " not found in classpath", e);
{code}
4. This is a very good suggestion. Incorporating this will save/reuse codecs.
5. Its a good option. But another way of looking at it is, Hadoop Tool interface users set
the {{-D}} option to set the codec. So thought that {{-compressOutput}} is additional overhead
for user to remember. But yes, it makes the users life easy to just set {{-compressOutput
org.apache.hadoop.io.compress.BZip2Codec}}. Will add this option.
6. Yes. I will create a JIRA with new description and attach the modified patch there.


> distcp should have an option to compress data while copying.
> ------------------------------------------------------------
>
>                 Key: HADOOP-8065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.2
>            Reporter: Suresh Antony
>            Assignee: Suraj Nayak
>            Priority: Minor
>              Labels: distcp
>             Fix For: 0.20.2
>
>         Attachments: HADOOP-8065-trunk_2015-11-03.patch, HADOOP-8065-trunk_2015-11-04.patch,
HADOOP-8065-trunk_2016-04-29-4.patch, patch.distcp.2012-02-10
>
>
> We would like compress the data while transferring from our source system to target system.
One way to do this is to write a map/reduce job to compress that after/before being transferred.
This looks inefficient. 
> Since distcp already reading writing data it would be better if it can accomplish while
doing this. 
> Flip side of this is that distcp -update option can not check file size before copying
data. It can only check for the existence of file. 
> So I propose if -compress option is given then file size is not checked.
> Also when we copy file appropriate extension needs to be added to file depending on compression
type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message