sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Xu" <sx.a...@googlemail.com>
Subject Review Request 24223: SQOOP-1390: Import data to HDFS as a set of Parquet files
Date Mon, 04 Aug 2014 04:20:13 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24223/
-----------------------------------------------------------

Review request for Sqoop.


Repository: sqoop-trunk


Description
-------

The patch proposes to add the possibility to import an individual table from a RDBMS into
HDFS as a set of Parquet files. It also supports a command-line interface with a new argument
`--as-parquetfile`
Example invocation: `sqoop import --connect JDBC_URI --table TABLE --as-parquetfile --target-dir
/path/to/files`

The major items are listed as follows:
*Implement `ParquetImportMapper`
*Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import job.

As Parquet is a columnar storage format, it doesn't make sense to write to it directly from
record-based tools. We've considered of using Kite SDK to simplify the handling of Parquet
specific things. The major idea is to convert `SqoopRecord` as `GenericRecord` and write them
into a Kite dataset. Kite SDK will convert these records to as a set of Parquet files.


Diffs
-----

  ivy.xml abc12a1 
  ivy/libraries.properties a59471e 
  pom-old.xml a8f4361 
  src/docs/man/import-args.txt a4ce4ec 
  src/docs/man/sqoop-import-all-tables.txt 6b639f5 
  src/docs/user/hcatalog.txt cd1dde3 
  src/docs/user/help.txt a9e1e89 
  src/docs/user/import-all-tables.txt 60645f1 
  src/docs/user/import.txt 192e97e 
  src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc 
  src/java/com/cloudera/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
  src/java/com/cloudera/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION 
  src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java a5f72f7 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb 
  src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea 
  src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d 
  src/licenses/LICENSE-BIN.txt 4215d26 

Diff: https://reviews.apache.org/r/24223/diff/


Testing
-------

Manually tested with a MySQL database. Unit tests are being developed yet.


Thanks,

Qian Xu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message