sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-1588) TO-side: Write data to HDFS
Date Wed, 15 Oct 2014 08:25:33 GMT
Qian Xu created SQOOP-1588:
------------------------------

             Summary: TO-side: Write data to HDFS
                 Key: SQOOP-1588
                 URL: https://issues.apache.org/jira/browse/SQOOP-1588
             Project: Sqoop
          Issue Type: Sub-task
            Reporter: Qian Xu
            Assignee: Qian Xu


Create a basic Kite connector that can write data (i.e. from a jdbc connection) to HDFS. 

The scope is defined as follows:
- Destination: HDFS
- File Format: Avro Parquet and CSV.
- Compression Codec: Use default
- Partitioner Strategy: Not supported
- Column Mapping: Not supported

Exposed Configuration:
- [Link] File Format (Enum)
- [To] Dataset URI (String, has a validation check) 

Workflow:
- Create a link to Kite Connector
- Create a job with valid configuration (see above)
- Start a job
- {{KiteToInitializer}} will check dataset existence 
- Sqoop will create N {{KiteLoader}} instances.
- {{KiteLoader}} will create an Avro schema regarding the FROM-schema (sorry, at runtime).
As Schema types are not identical to Avro types, a type mapping will happen in place. Original
Sqoop type will be described in the Avro schema, which can be used for reversed type mapping
for data export.
- {{KiteLoader}} will create a temporary dataset and writes allocated data records to it.
In case of any error, the dataset will be deleted.
- {{KiteToDestroy}} will merge all temporary datasets to be one dataset.

Further features will be implemented in follow-up JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message