sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheolsoo Park (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-721) Duplicating rows on export when exporting from compressed files.
Date Fri, 30 Nov 2012 03:15:59 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507060#comment-13507060
] 

Cheolsoo Park commented on SQOOP-721:
-------------------------------------

+1.

I diff'ed {{CombineFileInputFormat.java}} from Sqoop and Hadoop-2.0.x and confirmed that there
is one change as follows:
{code}
154c160,163
<     return codec instanceof SplittableCompressionCodec;
---
> 
>     // Once we remove support for Hadoop < 2.0
>     //return codec instanceof SplittableCompressionCodec;
>     return false;
{code}
As far as I understand, the only impact of this difference is that the compressed files won't
be split even though they're splitable, which doesn't have any impact on correctness while
it does on performance.

I didn't run any tests with this patch, but given that the patch is identical to what's committed
in MAPREDUCE-1597, I think that it is fine. Please let me know if anyone has any concerns.

Thanks!
                
> Duplicating rows on export when exporting from compressed files.
> ----------------------------------------------------------------
>
>                 Key: SQOOP-721
>                 URL: https://issues.apache.org/jira/browse/SQOOP-721
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.2
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Jarek Jarcec Cecho
>            Priority: Blocker
>         Attachments: bugSQOOP-721.patch, bugSQOOP-721.patch
>
>
> It appears that in some situations export will duplicate rows. It seems that this behavior
is happening when user is exporting compressed files that are "big enough".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message