sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Howe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SQOOP-1078) incremental import from database in direct mode
Date Thu, 13 Jun 2013 18:19:21 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tim Howe updated SQOOP-1078:

    Attachment: sqoop-incremental-direct.patch

I've written a patch which causes direct imports to use the same naming convention elsewhere.
 Attached please also find some changes to AppendUtils which improve resiliency especially
if there happen to be multiple concurrent operations on the same table.  This patch is against
sqoop-1.3.0-cdh3u3 but seems to apply and build with minimal changes across the whole 1.x

Note: I don't know where the "part-m-nnnnn" naming comes from and if the "-m" signifies anything.
 I did hunt around in order to find the code which creates those files but with no luck.

> incremental import from database in direct mode
> -----------------------------------------------
>                 Key: SQOOP-1078
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1078
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors, connectors/mysql, connectors/postgresql
>    Affects Versions: 1.3.0, 1.4.2, 1.4.3
>            Reporter: Tim Howe
>            Priority: Minor
>         Attachments: sqoop-incremental-direct.patch
> A problem exists in Sqoop's incremental import, namely that any imports
> after the first report success but the data never appears.
> A temporary file created on HDFS with the data but is deleted upon
> completion rather than being moved into place.
> It turns out to be a conflict between the "direct mode" database
> managers and "incremental mode" import.  Ordinarily Sqoop ends up
> creating files named part-m-nnnnn where nnnnn is an incrementing file
> partition number.  However the direct mode importer creates files of
> the form data-nnnnn.  This poses a problem because AppendUtils, which
> is used to move files into place at the end of a direct import, only
> copies files which match that part-m-nnnnn format and discards the
> rest.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message