hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-8065) distcp should have an option to compress data while copying.
Date Thu, 05 May 2016 21:48:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273160#comment-15273160
] 

Ravi Prakash commented on HADOOP-8065:
--------------------------------------

Thanks Suraj!
In CopyMapper, you are declaring {{codec}}, assigning it a value and then never using it.
Are you sure you need those changes? Maybe you are missing some part of the patch? I am looking
at [HADOOP-8065-trunk_2016-04-29-4.patch|https://issues.apache.org/jira/secure/attachment/12801507/HADOOP-8065-trunk_2016-04-29-4.patch]

To enable compression during transit is a MUCH bigger Epic. We may have to change [FileSystem|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L769],
and [BlockSender|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java]
amongst others (on the datanode side). A lot more people will also have an opinion on it and
its probably a multi-month effort. Also, striped blocks may make it more complicated. People
may argue that users should compress and decompress at the application level. It'd just be
way more complicated than what we are trying to do here.  I suggest we tackle that after this
problem

> distcp should have an option to compress data while copying.
> ------------------------------------------------------------
>
>                 Key: HADOOP-8065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.20.2
>            Reporter: Suresh Antony
>            Assignee: Suraj Nayak
>            Priority: Minor
>              Labels: distcp
>             Fix For: 0.20.2
>
>         Attachments: HADOOP-8065-trunk_2015-11-03.patch, HADOOP-8065-trunk_2015-11-04.patch,
HADOOP-8065-trunk_2016-04-29-4.patch, patch.distcp.2012-02-10
>
>
> We would like compress the data while transferring from our source system to target system.
One way to do this is to write a map/reduce job to compress that after/before being transferred.
This looks inefficient. 
> Since distcp already reading writing data it would be better if it can accomplish while
doing this. 
> Flip side of this is that distcp -update option can not check file size before copying
data. It can only check for the existence of file. 
> So I propose if -compress option is given then file size is not checked.
> Also when we copy file appropriate extension needs to be added to file depending on compression
type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message