sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abraham Elmahrek" <...@cloudera.com>
Subject Re: Review Request 24223: SQOOP-1390: Import data to HDFS as a set of Parquet files
Date Tue, 05 Aug 2014 23:41:19 GMT

This is an automatically generated e-mail. To reply, visit:

First pass... comments below!


    Do we need to include kitesdk for hadoop1 and hadoop2? See avro dependency for an example
of how to do this if we do need to.


    The dependencies can exist in ivy only. There's no need to include in this pom file.


    Same as above.


    com.cloudera.x is deprecated. No need to provide.


    com.cloudera.x is deprecated. No need to provide.


    You can get rid of this. The com.cloudera.x packages are not maintained any more.


    This is a bit confusing... could you add a few comments as to why an Avro schema would
be used with the ParquetJob?


    I don't believe this is possible. Perhaps you were looking for "Boolean"?

- Abraham Elmahrek

On Aug. 5, 2014, 6:25 a.m., Qian Xu wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24223/
> -----------------------------------------------------------
> (Updated Aug. 5, 2014, 6:25 a.m.)
> Review request for Sqoop.
> Repository: sqoop-trunk
> Description
> -------
> The patch proposes to add the possibility to import an individual table from a RDBMS
into HDFS as a set of Parquet files. It also supports a command-line interface with a new
argument `--as-parquetfile`
> Example invocation: `sqoop import --connect JDBC_URI --table TABLE --as-parquetfile --target-dir
> The major items are listed as follows:
> *Implement `ParquetImportMapper`
> *Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import job.
> As Parquet is a columnar storage format, it doesn't make sense to write to it directly
from record-based tools. We've considered of using Kite SDK to simplify the handling of Parquet
specific things. The major idea is to convert `SqoopRecord` as `GenericRecord` and write them
into a Kite dataset. Kite SDK will convert these records to as a set of Parquet files.
> Diffs
> -----
>   ivy.xml abc12a1 
>   ivy/libraries.properties a59471e 
>   pom-old.xml a8f4361 
>   src/docs/man/import-args.txt a4ce4ec 
>   src/docs/man/sqoop-import-all-tables.txt 6b639f5 
>   src/docs/user/hcatalog.txt cd1dde3 
>   src/docs/user/help.txt a9e1e89 
>   src/docs/user/import-all-tables.txt 60645f1 
>   src/docs/user/import.txt 192e97e 
>   src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc 
>   src/java/com/cloudera/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
>   src/java/com/cloudera/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION 
>   src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java a5f72f7 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea 
>   src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d 
>   src/licenses/LICENSE-BIN.txt 4215d26 
>   src/test/com/cloudera/sqoop/TestParquetImport.java PRE-CREATION 
> Diff: https://reviews.apache.org/r/24223/diff/
> Testing
> -------
> Manually tested with a MySQL database. Unit tests are being developed yet.
> Thanks,
> Qian Xu

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message