sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue" <b...@apache.org>
Subject Re: Review Request 24223: SQOOP-1390: Import data to HDFS as a set of Parquet files
Date Wed, 06 Aug 2014 16:41:31 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24223/#review49742
-----------------------------------------------------------


I agree with Joey, it would be better to use the DatasetKeyOutputFormat so you don't have
to maintain one.

You might also consider implementing a wrapper for SqoopRecord that implements GenericRecord
[1]. That would remove the need to copy the values from one map to the other.

[1]: http://avro.apache.org/docs/1.7.6/api/java/org/apache/avro/generic/GenericRecord.html

- Ryan Blue


On Aug. 6, 2014, 12:56 a.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24223/
> -----------------------------------------------------------
> 
> (Updated Aug. 6, 2014, 12:56 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> The patch proposes to add the possibility to import an individual table from a RDBMS
into HDFS as a set of Parquet files. It also supports a command-line interface with a new
argument `--as-parquetfile`
> Example invocation: `sqoop import --connect JDBC_URI --table TABLE --as-parquetfile --target-dir
/path/to/files`
> 
> The major items are listed as follows:
> *Implement `ParquetImportMapper`
> *Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import job.
> 
> As Parquet is a columnar storage format, it doesn't make sense to write to it directly
from record-based tools. We've considered of using Kite SDK to simplify the handling of Parquet
specific things. The major idea is to convert `SqoopRecord` as `GenericRecord` and write them
into a Kite dataset. Kite SDK will convert these records to as a set of Parquet files.
> 
> 
> Diffs
> -----
> 
>   ivy.xml abc12a1 
>   ivy/libraries.properties a59471e 
>   src/docs/man/import-args.txt a4ce4ec 
>   src/docs/man/sqoop-import-all-tables.txt 6b639f5 
>   src/docs/user/hcatalog.txt cd1dde3 
>   src/docs/user/help.txt a9e1e89 
>   src/docs/user/import-all-tables.txt 60645f1 
>   src/docs/user/import.txt 192e97e 
>   src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea 
>   src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d 
>   src/licenses/LICENSE-BIN.txt 4215d26 
>   src/test/com/cloudera/sqoop/TestParquetImport.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24223/diff/
> 
> 
> Testing
> -------
> 
> Manually tested with a MySQL database. Unit tests are being developed yet.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message