sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gwen Shapira (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1168) Sqoop2: Incremental Import
Date Tue, 05 Aug 2014 00:05:11 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085527#comment-14085527
] 

Gwen Shapira commented on SQOOP-1168:
-------------------------------------

Here's the approach I'm thinking of taking:


Incremental will be supported at the Connector level.
I.e. connectors can decide whether or not to support incremental. For HDFS for example, incremental
does’t make sense. For MongoDB, incremental if supported will look very different than for
JDBC.

Incremental will be supported for Extract part of the job only.

To support incremental queries in JDBC connector we need few new values in the ImportTableForm
(part of the ImportJobConfiguration):
* isIncremental (Y/N) — not sure its actually needed, maybe enough if checkColumn exists?
* incrementalColumn — hope to support expressions / functions as well as actual columns
* lastValue — First time can be given by user or we can have a default (get everything?
get nothing?). Later runs should be captured from output.

There’s obviously number of verifications and display-conditions we can implement here.

The change should include —

On connector side:
- If job is incremental (or perhaps if incrementalColumn is not null):
     - maximum value of incrementalColumn should be captured from the DB before the execution
starts (select max(incrementalColumn) from table where 1=1) and stored in repository for reuse
in next execution.
      - the extract query should have “incrementalColumn > lastValue and incrementalColumn
< last” condition

lastValue can be stored in the job (as part of the form. or in submission, if we give a job
way to get the last submission)
we don’t want it to be a fixed field, since who knows what else connectors will need.

I think the best option is if connectors will modify the job-connector form, and update the
lastValue field which it looks like they can do.

The main downside of this approach is that each connector can have slightly different parameter
names for a feature that does basically the same thing, which will be pretty confusing for
the users. But we already have this issue for common terms like "table"...

> Sqoop2: Incremental Import
> --------------------------
>
>                 Key: SQOOP-1168
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1168
>             Project: Sqoop
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>            Assignee: Hari Shreedharan
>
> Initial plan is to follow roughly the same design as Sqoop 1, except provide pluggability
to start this through a REST API.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message