sqoop-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jar...@apache.org
Subject git commit: SQOOP-1660: DOC: Connector SDK docs + validation to be updated
Date Tue, 04 Nov 2014 05:47:45 GMT
Repository: sqoop
Updated Branches:
  refs/heads/sqoop2 268a47552 -> aabd40b93


SQOOP-1660: DOC: Connector SDK docs + validation to be updated

(Veena Basavaraj via Jarek Jarcec Cecho)


Project: http://git-wip-us.apache.org/repos/asf/sqoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/sqoop/commit/aabd40b9
Tree: http://git-wip-us.apache.org/repos/asf/sqoop/tree/aabd40b9
Diff: http://git-wip-us.apache.org/repos/asf/sqoop/diff/aabd40b9

Branch: refs/heads/sqoop2
Commit: aabd40b936fdf987973bd2b674f66883893c1b59
Parents: 268a475
Author: Jarek Jarcec Cecho <jarcec@apache.org>
Authored: Mon Nov 3 21:46:53 2014 -0800
Committer: Jarek Jarcec Cecho <jarcec@apache.org>
Committed: Mon Nov 3 21:47:35 2014 -0800

----------------------------------------------------------------------
 docs/src/site/sphinx/ConnectorDevelopment.rst | 315 +++++++++++----------
 1 file changed, 166 insertions(+), 149 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/sqoop/blob/aabd40b9/docs/src/site/sphinx/ConnectorDevelopment.rst
----------------------------------------------------------------------
diff --git a/docs/src/site/sphinx/ConnectorDevelopment.rst b/docs/src/site/sphinx/ConnectorDevelopment.rst
index e4b5402..5e61943 100644
--- a/docs/src/site/sphinx/ConnectorDevelopment.rst
+++ b/docs/src/site/sphinx/ConnectorDevelopment.rst
@@ -18,38 +18,34 @@
 Sqoop 2 Connector Development
 =============================
 
-This document describes you how to implement connector for Sqoop 2
-using the code of built-in connector ( ``GenericJdbcConnector`` ) as example.
+This document describes how to implement a connector in the Sqoop 2 using the code sample
from one of the built-in connectors ( ``GenericJdbcConnector`` ) as a reference. Sqoop 2 jobs
support extraction from and/or loading to different data sources. Sqoop 2 connectors encapsulate
the job lifecyle operations for extracting and/or loading data from and/or to
+different data sources. Each connector will primarily focus on a particular data source and
its custom implementation for optimally reading and/or writing data in a distributed environment.
 
 .. contents::
 
-What is Connector?
-++++++++++++++++++
+What is a Sqoop Connector?
+++++++++++++++++++++++++++
 
-The connector provides the facilities to interact with external data sources.
-The connector can read from, or write to, a data source.
+The connector provides the facilities to interact with varied data sources that can be used
as a means to transfer between them. The connector implementation will provide logic to read
from and/or write to a data source that it represents. For instance the ( ``GenericJdbcConnector``
) encapsulates the logic to read from and/or write to jdbc enabled relational data sources.
The connector part that enables reading from a data source and transferring this data to internal
Sqoop format is called the FROM and the part that enables writng data to a data source by
transferring data from Sqoop format is called TO. In order to interact with these data sources,
the connector will provide one or many config classes and input fields within it.
+
+Broadly we support two main config types for connectors, link type represented by the enum
``ConfigType.LINK`` and job type represented by the enum ``ConfigType.JOB``. Link config represents
the properties to physically connect to the data source. Job config represent the properties
that are required to invoke reading from and/or writing to particular dataset in the data
source it connects to. If a connector supports both reading from and writing to, it will provide
the ``FromJobConfig`` and ``ToJobConfig`` objects. Each of these config objects are custom
to each connector and can have one or more inputs associated with each of the Link, FromJob
and ToJob config types. Hence we call the connectors as configurables i.e an entity that can
provide configs for interacting with the data source it represents. As the connectors evolve
over time to support new features in their data sources, the configs and inputs will change
as well. Thus the connector API also provides methods for upgradi
 ng the config and input names and data related to these data sources across different versions.
+
+The connectors implement logic for various stages of the extract/load process using the connector
API described below. While extracting/reading data from the data-source the main stages are
``Initializer``, ``Partitioner``, ``Extractor`` and ``Destroyer``. While loading/writitng
data to the data source the main stages currently supported are ``Initializer``, ``Loader``
and ``Destroyer``. Each stage has its unique set of responsibilities that are explained in
detail below. Since connectors understand the internals of the data source they represent,
they work in tandem with the sqoop supported execution engines such as MapReduce or Spark
(in future) to accomplish this process in a most optimal way.
 
 When do we add a new connector?
 ===============================
-You add a new connector when you need to extract data from a new data source, or load
-data to a new target.
-In addition to the connector API, Sqoop 2 also has an engine interface.
-At the moment the only engine is MapReduce, but we may support additional engines in the
future.
-Since many parallel execution engines are capable of reading/writing data
-there may be a question of whether support for specific data stores should be done
-through a new connector or new engine.
+You add a new connector when you need to extract/read data from a new data source, or load/write
+data into a new data source that is not supported yet in Sqoop 2.
+In addition to the connector API, Sqoop 2 also has an submission and execution engine interface.
+At the moment the only supported engine is MapReduce, but we may support additional engines
in the future such as Spark. Since many parallel execution engines are capable of reading/writing
data, there may be a question of whether adding support for a new data source should be done
through the connector or the execution engine API.
 
-**Our guideline is:** Connectors should manage all data extract/load. Engines manage job
-life cycles. If you need to support a new data store and don't care how jobs run -
-you are looking to add a connector.
+**Our guideline are as follows:** Connectors should manage all data extract(reading) from
and/or load(writing) into a data source. Submission and execution engine together manage the
job submission and execution life cycle to read/write data from/to data sources in the most
optimal way possible. If you need to support a new data store and details of linking to it
and don't care how the process of reading/writing from/to happens then you are looking to
add a connector and you should continue reading the below Connector API details to contribute
new connectors to Sqoop 2.
 
 
 Connector Implementation
 ++++++++++++++++++++++++
 
-The ``SqoopConnector`` class defines functionality
-which must be provided by Connectors.
-Each Connector must extend ``SqoopConnector`` and override the methods shown below.
+The ``SqoopConnector`` class defines an API for the connectors that must be implemented by
the connector developers. Each Connector must extend ``SqoopConnector`` and override the methods
shown below.
 ::
 
   public abstract String getVersion();
@@ -58,27 +54,27 @@ Each Connector must extend ``SqoopConnector`` and override the methods
shown bel
   public abstract Class getJobConfigurationClass(Direction direction);
   public abstract From getFrom();
   public abstract To getTo();
-  public abstract Validator getValidator();
-  public abstract MetadataUpgrader getMetadataUpgrader();
+  public abstract ConnectorConfigurableUpgrader getConfigurableUpgrader()
 
 Connectors can optionally override the following methods:
 ::
 
   public List<Direction> getSupportedDirections();
+  public Class<? extends IntermediateDataFormat<?>> getIntermediateDataFormat()
 
 
 The ``getFrom`` method returns From_ instance
-which is a placeholder for the modules needed to read from a data source.
+which is a ``Transferable`` entity that encapsulates the operations
+needed to read from the data source that the connector represents.
 
-The ``getTo`` method returns Extractor_ instance
-which is a placeholder for the modules needed to write to a data source.
+The ``getTo`` method returns To_ instance
+which is a ``Transferable`` entity that encapsulates the operations
+needed to write to the data source that the connector represents.
 
-Methods such as ``getBundle`` , ``getConnectionConfigurationClass`` ,
-``getJobConfigurationClass`` and ``getValidator``
-are concerned to `Connector configurations`_ .
+Methods such as ``getBundle`` , ``getLinkConfigurationClass`` , ``getJobConfigurationClass``
+are related to `Configurations`_
 
-The ``getSupportedDirections`` method returns a list of directions
-that a connector supports. This should be some subset of:
+Since a connector represents a data source and it can support one of the two directions,
either reading FROM its data source or writing to its data souurce or both, the ``getSupportedDirections``
method returns a list of directions that a connector will implement. This should be a subset
of the values in the ``Direction`` enum we provide:
 ::
 
   public List<Direction> getSupportedDirections() {
@@ -92,10 +88,7 @@ that a connector supports. This should be some subset of:
 From
 ====
 
-The connector's ``getFrom`` method returns ``From`` instance
-which is a placeholder for the modules needed for reading
-from a data source. Modules such as Partitioner_ and Extractor_ .
-The built-in ``GenericJdbcConnector`` defines ``From`` like this.
+The ``getFrom`` method returns From_ instance which is a ``Transferable`` entity that encapsulates
the operations needed to read from the data source the connector represents. The built-in
``GenericJdbcConnector`` defines ``From`` like this.
 ::
 
   private static final From FROM = new From(
@@ -111,52 +104,46 @@ The built-in ``GenericJdbcConnector`` defines ``From`` like this.
     return FROM;
   }
 
+Initializer and Destroyer
+-------------------------
+.. _Initializer:
+.. _Destroyer:
 
-Extractor
----------
-
-Extractor (E for ETL) extracts data from external database.
-
-Extractor must overrides ``extract`` method.
+Initializer is instantiated before the submission of sqoop job to the execution engine and
doing preparations such as connecting to the data source, creating temporary tables or adding
dependent jar files. Initializers are executed as the first step in the sqoop job lifecyle.
Here is the ``Initializer`` API.
 ::
 
-  public abstract void extract(ExtractorContext context,
-                               ConnectionConfiguration connectionConfiguration,
-                               JobConfiguration jobConfiguration,
-                               Partition partition);
+  public abstract void initialize(InitializerContext context, LinkConfiguration linkConfiguration,
+      JobConfiguration jobConfiguration);
 
-The ``extract`` method extracts data from database in some way and
-writes it to ``DataWriter`` (provided by context) as `Intermediate representation`_ .
+  public List<String> getJars(InitializerContext context, LinkConfiguration linkConfiguration,
+      JobConfiguration jobConfiguration);
+ 
+  public abstract Schema getSchema(InitializerContext context, LinkConfiguration linkConfiguration,
+      JobConfiguration jobConfiguration);
 
-Extractors use Writer's provided by the ExtractorContext to send a record through the
-framework.
-::
-
-  context.getDataWriter().writeArrayRecord(array);
+In addition to the initialize() method where the job execution preparation activities occur,
the ``Initializer`` must also implement the getSchema() method for the direction it supports.
The getSchema() method is used by the sqoop system to match the data extracted/read by the
``From`` instance of connector data source with the data loaded/written to the ``To`` instance
of the connector data source. In case of a relational database or columnar database, the returned
Schema object will include collection of columns with their data types. If the data source
is schema-less, such as a file, an empty Schema can be returned (i.e a Schema object without
any columns).
 
-The extractor must iterate through the entire dataset in the ``extract`` method.
-::
+NOTE: Sqoop 2 currently does not support extract and load between two connectors that represent
schema-less data sources. We expect that atleast the ``From`` instance of the connector or
the ``To`` instance of the connector in the sqoop job will have a schema. If both ``From``
and ``To`` have a associated non empty schema, Sqoop 2 will load data by column name, i.e,
data in column "A" in ``From`` instance of the connector for the job will be loaded to column
"A" in the ``To`` instance of the connector for that job.
 
-  while (resultSet.next()) {
-    ...
-    context.getDataWriter().writeArrayRecord(array);
-    ...
-  }
 
+``Destroyer`` is instantiated after the execution engine finishes its processing. It is the
last step in the sqoop job lifecyle, so pending clean up tasks such as dropping temporary
tables and closing connections. The term destroyer is a little misleading. It represents the
phase where the final output commits to the data source can also happen in case of the ``TO``
instance of the connector code.
 
 Partitioner
 -----------
 
-The Partitioner creates ``Partition`` instances based on configurations.
-The number of ``Partition`` instances is decided
-based on the value users specified as the numbers of extractors
-in job configuration.
+The ``Partitioner`` creates ``Partition`` instances ranging from 1..N. The N is driven by
a configuration as well. The default set of partitions created is set to 10 in the sqoop code.
Here is the ``Partitioner`` API
+
+``Partitioner`` must implement the ``getPartitions`` method in the ``Partitioner`` API.
+
+::
+
+  public abstract List<Partition> getPartitions(PartitionerContext context,
+      LinkConfiguration linkConfiguration, FromJobConfiguration jobConfiguration);
 
 ``Partition`` instances are passed to Extractor_ as the argument of ``extract`` method.
-Extractor_ determines which portion of the data to extract by Partition.
+Extractor_ determines which portion of the data to extract by a given partition.
 
-There is no actual convention for Partition classes
-other than being actually ``Writable`` and ``toString()`` -able.
+There is no actual convention for Partition classes other than being actually ``Writable``
and ``toString()`` -able. Here is the ``Partition`` API
 ::
 
   public abstract class Partition {
@@ -165,36 +152,41 @@ other than being actually ``Writable`` and ``toString()`` -able.
     public abstract String toString();
   }
 
-Connectors can define the design of ``Partition`` on their own.
+Connectors can implement custom ``Partition`` classes. ``GenericJdbcPartitioner`` is one
such example. It returns the ``GenericJdbcPartition`` objects.
 
+Extractor
+---------
 
-Initializer and Destroyer
--------------------------
-.. _Initializer:
-.. _Destroyer:
+Extractor (E for ETL) extracts data from a given data source
+``Extractor`` must implement the ``extract`` method in the ``Extractor`` API.
+::
 
-Initializer is instantiated before the submission of MapReduce job
-for doing preparation such as connecting to the data source, creating temporary tables or
adding dependent jar files.
+  public abstract void extract(ExtractorContext context,
+                               LinkConfiguration linkConfiguration,
+                               JobConfiguration jobConfiguration,
+                               SqoopPartition partition);
 
-In addition to the Initialize() method where the preparation activities occur, the Initializer
must implement a getSchema() method.
-This method is used by the framework to match the data extracted by the ``From`` connector
with the data as the ``To`` connector expects it.
-In case of a relational database or columnar database, the returned Schema object will include
collection of columns with their data types.
-If the data source is schema-less, such as a file, an empty Schema object can be returned
(i.e a Schema object without any columns).
+The ``extract`` method extracts data from the data source using the link and job configuration
properties and writes it to the ``DataWriter`` (provided by the extractor context) as the
default `Intermediate representation`_ .
 
-Note that Sqoop2 currently does not support ETL between two schema-less sources. We expect
for each job that either the connector providing
-the ``From`` instance or the connector providing the ``To`` instance will have a schema.
If both instances have a schema, Sqoop2 will load data by column name.
-I.e, data in column "A" in data source will be loaded to column "A" in target.
+Extractors use Writer's provided by the ExtractorContext to send a record through the sqoop
system. 
+::
 
-Destroyer is instantiated after MapReduce job is finished for clean up, for example dropping
temporary tables and closing connections.
+  context.getDataWriter().writeArrayRecord(array);
+
+The extractor must iterate through the given partition in the ``extract`` method.
+::
+
+  while (resultSet.next()) {
+    ...
+    context.getDataWriter().writeArrayRecord(array);
+    ...
+  }
 
 
 To
 ==
 
-The Connector's ``getTo`` method returns a ``To`` instance
-which is a placeholder for the modules needed for writing
-to a data source such as Loader_ .
-The built-in ``GenericJdbcConnector`` defines ``To`` like this.
+The ``getTo`` method returns ``TO`` instance which is a ``Transferable`` entity that encapsulates
the operations needed to wtite data to the data source the connector represents. The built-in
``GenericJdbcConnector`` defines ``To`` like this.
 ::
 
   private static final To TO = new To(
@@ -210,21 +202,26 @@ The built-in ``GenericJdbcConnector`` defines ``To`` like this.
   }
 
 
+Initializer and Destroyer
+-------------------------
+
+Initializer_ and Destroyer_ of a ``To`` instance are used in a similar way to those of a
``From`` instance.
+Refer to the previous section for more details.
+
+
 Loader
 ------
 
-A loader (L for ETL) receives data from the Sqoop framework and
-loads it to an external database.
+A loader (L for ETL) receives data from the ``From`` instance of the sqoop connector associated
with the sqoop job and then loads it to an ``TO`` instance of the connector associated with
the same sqoop job
 
-Loaders must overrides ``load`` method.
+``Loader`` must implement ``load`` method of the ``Loader`` API
 ::
 
   public abstract void load(LoaderContext context,
                             ConnectionConfiguration connectionConfiguration,
                             JobConfiguration jobConfiguration) throws Exception;
 
-The ``load`` method reads data from ``DataReader`` (provided by context)
-in `Intermediate representation`_ and loads it to database in some way.
+The ``load`` method reads data from ``DataReader`` (provided by context) in the default `Intermediate
representation`_ and loads it to data source.
 
 Loader must iterate in the ``load`` method until the data from ``DataReader`` is exhausted.
 ::
@@ -233,23 +230,14 @@ Loader must iterate in the ``load`` method until the data from ``DataReader``
is
     ...
   }
 
+NOTE: we do not yet support a stage for connector developers to control how to balance the
loading/writitng of data across the mutiple loaders. In future we may be adding this to the
connector API to have custom logic to balance the loading across multiple reducers.
 
-Initializer and Destroyer
--------------------------
-
-Initializer_ and Destroyer_ of a ``To`` instance are used in a similar way to those of a
``From`` instance.
-Refer to the previous section for more details.
-
-
-Connector Configurations
-++++++++++++++++++++++++
-
-Connector specifications
-========================
+Configurables
++++++++++++++
 
-Sqoop loads definitions of connectors
-from the file named ``sqoopconnector.properties``
-which each connector implementation provides.
+Configurable registration
+=========================
+One of the currently supported configurable in Sqoop are the connectors. Sqoop 2 registers
definitions of connectors from the file named ``sqoopconnector.properties`` which each connector
implementation should provide to become available in Sqoop.
 ::
 
   # Generic JDBC Connector Properties
@@ -260,14 +248,12 @@ which each connector implementation provides.
 Configurations
 ==============
 
-Implementations of ``SqoopConnector`` overrides methods such as
-``getConnectionConfigurationClass`` and ``getJobConfigurationClass``
-returning configuration class.
+Implementations of ``SqoopConnector`` overrides methods such as ``getLinkConfigurationClass``
and ``getJobConfigurationClass`` returning configuration class.
 ::
 
   @Override
-  public Class getConnectionConfigurationClass() {
-    return ConnectionConfiguration.class;
+  public Class getLinkConfigurationClass() {
+    return LinkConfiguration.class;
   }
 
   @Override
@@ -282,43 +268,52 @@ returning configuration class.
     }
   }
 
-Configurations are represented
-by models defined in ``org.apache.sqoop.model`` package.
-Annotations such as
-``ConfigurationClass`` , ``FormClass`` , ``Form`` and ``Input``
-are provided for defining configurations of each connectors
-using these models.
+Configurations are represented by annotations defined in ``org.apache.sqoop.model`` package.
+Annotations such as ``ConfigurationClass`` , ``ConfigClass`` , ``Config`` and ``Input``
+are provided for defining configuration objects for each connector.
 
-``ConfigurationClass`` is a place holder for ``FormClasses`` .
+``@ConfigurationClass`` is a marker annotation for ``ConfigurationClasses``  that hold a
group or lis of ``ConfigClasses`` annotated with the marker ``@ConfigClass``
 ::
 
   @ConfigurationClass
-  public class ConnectionConfiguration {
+  public class LinkConfiguration {
 
-    @Form public ConnectionForm connection;
+    @Config public LinkConfig linkConfig;
 
-    public ConnectionConfiguration() {
-      connection = new ConnectionForm();
+    public LinkConfiguration() {
+      linkConfig = new LinkConfig();
     }
   }
 
-Each ``FormClass`` defines names and types of configs.
+Each ``ConfigClass`` defines the different inputs it exposes for the link and job configs.
These inputs are annotated with ``@Input`` and the user will be asked to fill in when they
create a sqoop job and choose to use this instance of the connector for either the ``From``
or ``To`` part of the job.
+
 ::
 
-  @FormClass
-  public class ConnectionForm {
-    @Input(size = 128) public String jdbcDriver;
-    @Input(size = 128) public String connectionString;
-    @Input(size = 40)  public String username;
-    @Input(size = 40, sensitive = true) public String password;
-    @Input public Map<String, String> jdbcProperties;
-  }
+    @ConfigClass(validators = {@Validator(LinkConfig.ConfigValidator.class)})
+    public class LinkConfig {
+      @Input(size = 128, validators = {@Validator(NotEmpty.class), @Validator(ClassAvailable.class)}
)
+      @Input(size = 128) public String jdbcDriver;
+      @Input(size = 128) public String connectionString;
+      @Input(size = 40)  public String username;
+      @Input(size = 40, sensitive = true) public String password;
+      @Input public Map<String, String> jdbcProperties;
+    }
 
+Each ``ConfigClass`` and the  inputs within the configs annotated with ``Input`` can specifiy
validators via the ``@Validator`` annotation described below.
 
-ResourceBundle
-==============
+Empty Configuration
+-------------------
+If a connector does not have any configuration inputs to specify for the ``ConfigType.LINK``
or ``ConfigType.JOB`` it is recommended to return the ``EmptyConfiguration`` class in the
``getLinkConfigurationClass()`` or ``getJobConfigurationClass(..)`` methods.
+::
 
-Resources used by client user interfaces are defined in properties file.
+   @ConfigurationClass
+   public class EmptyConfiguration { }
+
+
+Configuration ResourceBundle
+============================
+
+The config and its corresponding input names, the input field description are represented
in the config resource bundle defined per connector.
 ::
 
   # jdbc driver
@@ -333,7 +328,7 @@ Resources used by client user interfaces are defined in properties file.
 
   ...
 
-Those resources are loaded by ``getBundle`` method of connector.
+Those resources are loaded by ``getBundle`` method of the ``SqoopConnector.``
 ::
 
   @Override
@@ -343,22 +338,44 @@ Those resources are loaded by ``getBundle`` method of connector.
   }
 
 
-Validator
-=========
+Validations for Configs and Inputs
+==================================
+
+Validators validate the config objects and the inputs associated with the config objects.
For config objects themselves we encourage developers to write custom valdiators for both
the link and job config types.
+
+::
+
+   @Input(size = 128, validators = {@Validator(value = StartsWith.class, strArg = "jdbc:")}
)
+
+   @Input(size = 255, validators = { @Validator(NotEmpty.class) })
+  
+Sqoop 2 provides a list of standard input validators that can be used by different connectors
for the link and job type configuration inputs.
+
+::
+
+    public class NotEmpty extends AbstractValidator<String> {
+    @Override
+    public void validate(String instance) {
+      if (instance == null || instance.isEmpty()) {
+       addMessage(Status.ERROR, "Can't be null nor empty");
+      }
+     }
+    }
+
+The validation logic is executed when users creating the sqoop jobs input values for the
link and job configs associated with the ``From`` and ``To`` instances of the connectors associated
with the job.
 
-Validator validates configurations set by users.
 
+Sqoop 2 MapReduce Job Execution Lifecycle with Connector API
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 
-Internal of Sqoop2 MapReduce Job
-++++++++++++++++++++++++++++++++
+Sqoop 2 provides MapReduce utilities such as ``SqoopMapper`` and ``SqoopReducer`` that aid
sqoop job execution.
 
-Sqoop 2 provides common MapReduce modules such as ``SqoopMapper`` and ``SqoopReducer``.
+Note: Any class prefixed with Sqoop is a internal sqoop class provided for MapReduce and
is not part of the conenector API. These internal classes work with the custom implementations
of ``Extractor``, ``Partitioner`` in the ``From`` instance and ``Loader`` in the ``To`` instance
of the connector.
 
-When reading from a data source, the ``Extractor`` provided by the FROM connector extracts
data from a database,
-and the ``Loader``, provided by the TO connector, loads data into another database.
+When reading from a data source, the ``Extractor`` provided by the ``From`` instance of the
connector extracts data from a corresponding data source it represents and the ``Loader``,
provided by the TO instance of the connector, loads data into the data source it represents.
 
 The diagram below describes the initialization phase of a job.
-``SqoopInputFormat`` create splits using ``Partitioner`` .
+``SqoopInputFormat`` create splits using ``Partitioner``.
 ::
 
       ,----------------.          ,-----------.
@@ -377,16 +394,16 @@ The diagram below describes the initialization phase of a job.
               |                         |              |          `----+-----'
 
 The diagram below describes the map phase of a job.
-``SqoopMapper`` invokes FROM connector's extractor's ``extract`` method.
+``SqoopMapper`` invokes ``From`` connector's extractor's ``extract`` method.
 ::
 
       ,-----------.
       |SqoopMapper|
       `-----+-----'
      run    |
-  --------->|                                   ,-------------.
-            |---------------------------------->|MapDataWriter|
-            |                                   `------+------'
+  --------->|                                   ,------------------.
+            |---------------------------------->|SqoopMapDataWriter|
+            |                                   `------+-----------'
             |                ,---------.               |
             |--------------> |Extractor|               |
             |                `----+----'               |
@@ -404,12 +421,12 @@ The diagram below describes the map phase of a job.
             |                     |                    |-------------------------->
 
 The diagram below decribes the reduce phase of a job.
-``OutputFormat`` invokes TO connector's loader's ``load`` method (via ``SqoopOutputFormatLoadExecutor``
).
+``OutputFormat`` invokes ``To`` connector's loader's ``load`` method (via ``SqoopOutputFormatLoadExecutor``
).
 ::
 
-    ,-------.  ,---------------------.
-    |Reducer|  |SqoopNullOutputFormat|
-    `---+---'  `----------+----------'
+    ,------------.  ,---------------------.
+    |SqoopReducer|  |SqoopNullOutputFormat|
+    `---+--------'  `----------+----------'
         |                 |   ,-----------------------------.
         |                 |-> |SqoopOutputFormatLoadExecutor|
         |                 |   `--------------+--------------'        ,----.


Mime
View raw message