sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Xu" <sx.a...@googlemail.com>
Subject Re: Review Request 24223: SQOOP-1390: Import data to HDFS as a set of Parquet files
Date Tue, 19 Aug 2014 04:31:13 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24223/
-----------------------------------------------------------

(Updated Aug. 19, 2014, 12:31 p.m.)


Review request for Sqoop.


Changes
-------

Renamed SqoopGenericRecord to SqoopAvroRecord and created a helper class for Avro related
common code.


Repository: sqoop-trunk


Description
-------

The patch proposes to add the possibility to import an individual table from a RDBMS into
HDFS as a set of Parquet files. It also supports a command-line interface with a new argument
`--as-parquetfile`
Example invocation: `sqoop import --connect JDBC_URI --table TABLE --as-parquetfile --target-dir
/path/to/files`

The major items are listed as follows:
*Implement `ParquetImportMapper`
*Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import job.
*Support both import from scratch and in append mode

As Parquet is a columnar storage format, it doesn't make sense to write to it directly from
record-based tools. We've considered of using Kite SDK to simplify the handling of Parquet
specific things. The major idea is to convert `SqoopRecord` as `GenericRecord` and write them
into a Kite dataset. Kite SDK will convert these records to as a set of Parquet files.


Diffs (updated)
-----

  ivy.xml abc12a1 
  ivy/libraries.properties a59471e 
  src/docs/man/import-args.txt a4ce4ec 
  src/docs/man/sqoop-import-all-tables.txt 6b639f5 
  src/docs/user/hcatalog.txt cd1dde3 
  src/docs/user/help.txt a9e1e89 
  src/docs/user/import-all-tables.txt 60645f1 
  src/docs/user/import.txt 192e97e 
  src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc 
  src/java/org/apache/sqoop/avro/AvroUtil.java PRE-CREATION 
  src/java/org/apache/sqoop/lib/SqoopAvroRecord.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/AvroImportMapper.java 289eb28 
  src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb 
  src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
  src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION 
  src/java/org/apache/sqoop/orm/ClassWriter.java 94ff576 
  src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea 
  src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d 
  src/java/org/apache/sqoop/util/AppendUtils.java 5eaaa95 
  src/licenses/LICENSE-BIN.txt 4215d26 
  src/test/com/cloudera/sqoop/TestParquetImport.java PRE-CREATION 

Diff: https://reviews.apache.org/r/24223/diff/


Testing
-------

Included 4 test cases. All of them are executed successfully.


Thanks,

Qian Xu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message