sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boglarka Egyed (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1086) Running multiple incremental sqoop jobs in parallel resets the first sqoop job's --last-value
Date Tue, 26 Jul 2016 13:06:20 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393756#comment-15393756

Boglarka Egyed commented on SQOOP-1086:

We have reviewed the situation with Attila Szabo and Szabolcs Vasas and have agreed on that
there is no code fix option for this. The problem is caused because during the usage of the
built-in metasore Sqoop writes the INSERT/UPDATE related information into the /var/lib/hadoop-hdfs/.sqoop/metastore.db.script
HSQL dump-like file thus parallel job execution can not be handled properly. We suggest to
open a Documentation JIRA ticket instead with a recommendation to use shared metastore.

> Running multiple incremental sqoop jobs in parallel resets the first sqoop job's --last-value
> ---------------------------------------------------------------------------------------------
>                 Key: SQOOP-1086
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1086
>             Project: Sqoop
>          Issue Type: Bug
>    Affects Versions: 1.4.0-incubating
>         Environment: Ubuntu 12.04.2
>            Reporter: Byron
>            Assignee: Boglarka Egyed
>            Priority: Critical
>              Labels: import, incremental, job, parallel
> I've created 2 jobs (different names) that pull from the same database(MSSQL), but 2
different tables.
> They both use incremental append.
> If I run the jobs in sequence, I got no issue and the meta store for both jobs remembers
the --last-value per job.
> If I run the jobs in parallel, when the 1st job finished the meta is updated with the
--last-value correctly, but once the 2nd job finished the 1st job's meta --last-value is reset.
> First Job
> # create the import job into the incremental table
> $ENV_SQOOP_HOME/bin/sqoop job -D mapred.job.name="Job 1" --create "import-events" --
import --connect "$ENV_TRACKING_CONNECTION" --table "$TABLE1" --split-by "dtmDBDateTime" --target-dir
"$OUTPUT1" --incremental append --check-column "dtmDBDateTime" --last-value "2012-01-01 00:00:00.000"
--fields-terminated-by \\t --null-string '' --null-non-string '';
> Second Job
> # create the import job into the table
> $ENV_SQOOP_HOME/bin/sqoop job -D mapred.job.name="Job 2" --create "import-impressions"
-- import --connect "$ENV_TRACKING_CONNECTION" --table "$TABLE2" --split-by "dtmDBDateTime"
--target-dir "$OUTPUT2" --incremental append --check-column "dtmDBDateTime" --last-value "2012-01-01
00:00:00.000" --fields-terminated-by \\t --null-string '' --null-non-string '';

This message was sent by Atlassian JIRA

View raw message