Updated Branches:
refs/heads/sqoop2 c07c85758 -> d140c5ddd
SQOOP-1155: Sqoop 2 documentation for connector development
(Masatake Iwasaki via Jarek Jarcec Cecho)
Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/d140c5dd
Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/d140c5dd
Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/d140c5dd
Branch: refs/heads/sqoop2
Commit: d140c5ddd06f7b0179044184098b3521ab6227d0
Parents: c07c857
Author: Jarek Jarcec Cecho <jarcec@apache.org>
Authored: Thu Aug 15 08:23:54 2013 -0700
Committer: Jarek Jarcec Cecho <jarcec@apache.org>
Committed: Thu Aug 15 08:23:54 2013 -0700
----------------------------------------------------------------------
docs/src/site/sphinx/ConnectorDevelopment.rst | 250 +++++++++++++++++++++
docs/src/site/sphinx/index.rst | 1 +
2 files changed, 251 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/sqoop/blob/d140c5dd/docs/src/site/sphinx/ConnectorDevelopment.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/ConnectorDevelopment.rst b/docs/src/site/sphinx/ConnectorDevelopment.rst
new file mode 100644
index 0000000..918ca00
--- /dev/null
+++ b/docs/src/site/sphinx/ConnectorDevelopment.rst
@@ -0,0 +1,250 @@
+.. Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+
+
+=============================
+Sqoop 2 Connector Development
+=============================
+
+This document describes you how to implement connector for Sqoop 2.
+
+
+What is Connector?
+++++++++++++++++++
+
+Connector provides interaction with external databases.
+Connector reads data from databases for import,
+and write data to databases for export.
+Interaction with Hadoop is taken cared by common modules of Sqoop 2 framework.
+
+
+Connector Implementation
+++++++++++++++++++++++++
+
+The SqoopConnector class defines functionality
+which must be provided by Connectors.
+Each Connector must extends SqoopConnector and overrides methods shown below.
+::
+
+ public abstract String getVersion();
+ public abstract ResourceBundle getBundle(Locale locale);
+ public abstract Class getConnectionConfigurationClass();
+ public abstract Class getJobConfigurationClass(MJob.Type jobType);
+ public abstract Importer getImporter();
+ public abstract Exporter getExporter();
+ public abstract Validator getValidator();
+ public abstract MetadataUpgrader getMetadataUpgrader();
+
+The getImporter method returns Importer_ instance
+which is a placeholder for the modules needed for import.
+
+The getExporter method returns Exporter_ instance
+which is a placeholder for the modules needed for export.
+
+Methods such as getBundle, getConnectionConfigurationClass,
+getJobConfigurationClass and getValidator
+are concerned to `Connector configurations`_ .
+
+
+Importer
+========
+
+Connector#getImporter method returns Importer instance
+which is a placeholder for the modules needed for import
+such as Partitioner_ and Extractor_ .
+Built-in GenericJdbcConnector defines Importer like this.
+::
+
+ private static final Importer IMPORTER = new Importer(
+ GenericJdbcImportInitializer.class,
+ GenericJdbcImportPartitioner.class,
+ GenericJdbcImportExtractor.class,
+ GenericJdbcImportDestroyer.class);
+
+ ...
+
+ @Override
+ public Importer getImporter() {
+ return IMPORTER;
+ }
+
+
+Extractor
+---------
+
+Extractor (E for ETL) extracts data from external database and
+writes it to Sqoop framework for import.
+
+Extractor must overrides extract method.
+::
+
+ public abstract void extract(ExtractorContext context,
+ ConnectionConfiguration connectionConfiguration,
+ JobConfiguration jobConfiguration,
+ Partition partition);
+
+The extract method extracts data from database in some way and
+writes it to DataWriter (provided by context) as `Intermediate representation`_ .
+
+Extractor must iterates in the extract method until the data from database exhausts.
+::
+
+ while (resultSet.next()) {
+ ...
+ context.getDataWriter().writeArrayRecord(array);
+ ...
+ }
+
+
+Partitioner
+-----------
+
+Partitioner creates Partition instances based on configurations.
+The number of Partition instances is interpreted as the number of map tasks.
+Partition instances are passed to Extractor_ as the argument of extract method.
+Extractor_ determines which portion of the data to extract by Partition.
+
+There is no actual convention for Partition classes
+other than being actually Writable and toString()-able.
+::
+
+ public abstract class Partition {
+ public abstract void readFields(DataInput in) throws IOException;
+ public abstract void write(DataOutput out) throws IOException;
+ public abstract String toString();
+ }
+
+Connectors can define the design of Partition on their own.
+
+
+Initializer and Destroyer
+-------------------------
+
+Initializer is instantiated before the submission of MapReduce job
+for doing preparation such as adding dependent jar files.
+
+Destroyer is instantiated after MapReduce job is finished for clean up.
+
+
+Exporter
+========
+
+Connector#getExporter method returns Exporter instance
+which is a placeholder for the modules needed for export
+such as Loader_ .
+Built-in GenericJdbcConnector defines Exporter like this.
+::
+
+ private static final Exporter EXPORTER = new Exporter(
+ GenericJdbcExportInitializer.class,
+ GenericJdbcExportLoader.class,
+ GenericJdbcExportDestroyer.class);
+
+ ...
+
+ @Override
+ public Exporter getExporter() {
+ return EXPORTER;
+ }
+
+
+Loader
+------
+
+Loader (L for ETL) receives data from Sqoop framework and
+loads it to external database.
+
+Loader must overrides load method.
+::
+
+ public abstract void load(LoaderContext context,
+ ConnectionConfiguration connectionConfiguration,
+ JobConfiguration jobConfiguration) throws Exception;
+
+The load method reads data from DataReader (provided by context)
+in `Intermediate representation`_ and loads it to database in some way.
+
+Loader must iterates in the load method until the data from DataReader exhausts.
+::
+
+ while ((array = context.getDataReader().readArrayRecord()) != null) {
+ ...
+ }
+
+
+Initializer and Destroyer
+-------------------------
+
+Initializer is instantiated before the submission of MapReduce job
+for doing preparation such as adding dependent jar files.
+
+Destroyer is instantiated after MapReduce job is finished for clean up.
+
+
+Connector Configurations
+++++++++++++++++++++++++
+
+Configurations
+==============
+
+The definition of the configurations are represented
+by models defined in org.apache.sqoop.model package.
+
+
+ConnectionConfigurationClass
+----------------------------
+
+
+JobConfigurationClass
+---------------------
+
+
+ResourceBundle
+==============
+
+Resources for Configurations_ are stored in properties file
+accessed by getBundle method of the Connector.
+
+
+Validator
+=========
+
+Validator validates configurations set by users.
+
+
+Internal of Sqoop2 MapReduce Job
+++++++++++++++++++++++++++++++++
+
+Sqoop 2 provides common MapReduce modules such as SqoopMapper and SqoopReducer
+for the both of import and export.
+
+- InputFormat create splits using Partitioner.
+
+- SqoopMapper invokes Extractor's extract method.
+
+- SqoopReducer do no actual works.
+
+- OutputFormat invokes Loader's load method (via SqoopOutputFormatLoadExecutor).
+
+.. todo: sequence diagram like figure.
+
+For import, Extractor provided by Connector extracts data from databases,
+and Loader provided by Sqoop2 loads data into Hadoop.
+
+For export, Extractor provided Sqoop2 exracts data from Hadoop,
+and Loader provided by Connector loads data into databases.
+
+
+.. _`Intermediate representation`: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation
http://git-wip-us.apache.org/repos/asf/sqoop/blob/d140c5dd/docs/src/site/sphinx/index.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/index.rst b/docs/src/site/sphinx/index.rst
index 15ddfbb..6b03e62 100644
--- a/docs/src/site/sphinx/index.rst
+++ b/docs/src/site/sphinx/index.rst
@@ -61,3 +61,4 @@ Developer Guide
- `Building Sqoop2 <BuildingSqoop2.html>`_
- `Development Environment Setup <DevEnv.html>`_
- `Java Client API Guide <ClientAPI.html>`_
+- `Developping Connector <ConnectorDevelopment.html>`_
|