sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Veena Basavaraj (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SQOOP-1799) Connector API : Ability for connector to indicate if its FROM and TO support incremental reading/ writing
Date Thu, 15 Jan 2015 17:24:34 GMT

     [ https://issues.apache.org/jira/browse/SQOOP-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Veena Basavaraj updated SQOOP-1799:
-----------------------------------
    Description: 


No longer a necessity,  If the connectors have delta read/ write configs we will display it
and they will use those config values to do the appropriate form or reading from and writing
to the data source. At this point having this in the initializer API does not seem necessary,
we can revisit if we need this information upfront for any form of validation when the job
is created.

By default it is assumed the connectors will do a full fetch and full write from clean slate.
For instance if the TO does not support delta records to be written is some fashion, but the
FROM side only gave subset of records, we cannot expect delta append or merge ( overwriting
existing records with no dupes) to happen. 

  was:
One suggestion would be have a connector's FROM/ TO initializer to expose if it even supports
incremental. So this can be used to immediately validate the job creation.

{code}
 sqoop > create incremental-job -f 1 -t 2 
{code}

HDFS FROM supporting incrementation read ? Does this even apply. But surely the TO side should
support the delta/ incremental write.

Both the from connector and to connector has to support this feature before we proceed. The
default will be false. The Initializer API will be updated to support this.


{code}

import java.util.LinkedList;
import java.util.List;

import org.apache.sqoop.schema.NullSchema;
import org.apache.sqoop.schema.Schema;

/**
 * This allows connector to define initialization work for execution,
 * for example, context configuration.
 */
public abstract class Initializer<LinkConfiguration, JobConfiguration> {

  /**
   * Initialize new submission based on given configuration properties. Any
   * needed temporary values might be saved to context object and they will be
   * promoted to all other part of the workflow automatically.
   *
   * @param context Initializer context object
   * @param linkConfiguration link configuration object
   * @param jobConfiguration job configuration object for the FROM and TO
   *        In case of the FROM initializer this will represent the FROM job configuration
   *        In case of the TO initializer this will represent the TO job configuration
   */
  public abstract void initialize(InitializerContext context, LinkConfiguration linkConfiguration,
      JobConfiguration jobConfiguration);

  /**
   * Return list of all jars that this particular connector needs to operate on
   * following job. This method will be called after running initialize method.
   * @param context Initializer context object
   * @param linkConfiguration link configuration object
   * @param jobConfiguration job configuration object for the FROM and TO
   *        In case of the FROM initializer this will represent the FROM job configuration
   *        In case of the TO initializer this will represent the TO job configuration
   * @return
   */
  public List<String> getJars(InitializerContext context, LinkConfiguration linkConfiguration,
      JobConfiguration jobConfiguration) {
    return new LinkedList<String>();
  }

  /**
   * Return schema associated with the connector for FROM and TO
   * By default we assume a null schema. Override the method if there a custom schema to provide
either for FROM or TO
   * @param context Initializer context object
   * @param linkConfiguration link configuration object
   * @param jobConfiguration job configuration object for the FROM and TO
   *        In case of the FROM initializer this will represent the FROM job configuration
   *        In case of the TO initializer this will represent the TO job configuration
   * @return
   */

  public Schema getSchema(InitializerContext context, LinkConfiguration linkConfiguration,
      JobConfiguration jobConfiguration) {
    return NullSchema.getInstance();
  }

{code}



> Connector API : Ability for connector to indicate if its FROM and TO support incremental
reading/ writing
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: SQOOP-1799
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1799
>             Project: Sqoop
>          Issue Type: Sub-task
>            Reporter: Veena Basavaraj
>            Assignee: Veena Basavaraj
>             Fix For: 1.99.5
>
>
> No longer a necessity,  If the connectors have delta read/ write configs we will display
it and they will use those config values to do the appropriate form or reading from and writing
to the data source. At this point having this in the initializer API does not seem necessary,
we can revisit if we need this information upfront for any form of validation when the job
is created.
> By default it is assumed the connectors will do a full fetch and full write from clean
slate.
> For instance if the TO does not support delta records to be written is some fashion,
but the FROM side only gave subset of records, we cannot expect delta append or merge ( overwriting
existing records with no dupes) to happen. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message