sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Abraham Elmahrek" <...@cloudera.com>
Subject Re: Review Request 24223: SQOOP-1390: Import data to HDFS as a set of Parquet files
Date Tue, 05 Aug 2014 23:41:19 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24223/#review49642
-----------------------------------------------------------


First pass... comments below!


ivy.xml
<https://reviews.apache.org/r/24223/#comment86918>

    Do we need to include kitesdk for hadoop1 and hadoop2? See avro dependency for an example
of how to do this if we do need to.



pom-old.xml
<https://reviews.apache.org/r/24223/#comment86916>

    The dependencies can exist in ivy only. There's no need to include in this pom file.



pom-old.xml
<https://reviews.apache.org/r/24223/#comment86917>

    Same as above.



src/java/com/cloudera/sqoop/mapreduce/ParquetImportMapper.java
<https://reviews.apache.org/r/24223/#comment86890>

    com.cloudera.x is deprecated. No need to provide.



src/java/com/cloudera/sqoop/mapreduce/ParquetOutputFormat.java
<https://reviews.apache.org/r/24223/#comment86891>

    com.cloudera.x is deprecated. No need to provide.



src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java
<https://reviews.apache.org/r/24223/#comment86889>

    You can get rid of this. The com.cloudera.x packages are not maintained any more.



src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java
<https://reviews.apache.org/r/24223/#comment86892>

    This is a bit confusing... could you add a few comments as to why an Avro schema would
be used with the ParquetJob?



src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java
<https://reviews.apache.org/r/24223/#comment86898>

    I don't believe this is possible. Perhaps you were looking for "Boolean"?


- Abraham Elmahrek


On Aug. 5, 2014, 6:25 a.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24223/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2014, 6:25 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> The patch proposes to add the possibility to import an individual table from a RDBMS
into HDFS as a set of Parquet files. It also supports a command-line interface with a new
argument `--as-parquetfile`
> Example invocation: `sqoop import --connect JDBC_URI --table TABLE --as-parquetfile --target-dir
/path/to/files`
> 
> The major items are listed as follows:
> *Implement `ParquetImportMapper`
> *Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import job.
> 
> As Parquet is a columnar storage format, it doesn't make sense to write to it directly
from record-based tools. We've considered of using Kite SDK to simplify the handling of Parquet
specific things. The major idea is to convert `SqoopRecord` as `GenericRecord` and write them
into a Kite dataset. Kite SDK will convert these records to as a set of Parquet files.
> 
> 
> Diffs
> -----
> 
>   ivy.xml abc12a1 
>   ivy/libraries.properties a59471e 
>   pom-old.xml a8f4361 
>   src/docs/man/import-args.txt a4ce4ec 
>   src/docs/man/sqoop-import-all-tables.txt 6b639f5 
>   src/docs/user/hcatalog.txt cd1dde3 
>   src/docs/user/help.txt a9e1e89 
>   src/docs/user/import-all-tables.txt 60645f1 
>   src/docs/user/import.txt 192e97e 
>   src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc 
>   src/java/com/cloudera/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
>   src/java/com/cloudera/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION 
>   src/java/com/cloudera/sqoop/tool/BaseSqoopTool.java a5f72f7 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ParquetOutputFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea 
>   src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d 
>   src/licenses/LICENSE-BIN.txt 4215d26 
>   src/test/com/cloudera/sqoop/TestParquetImport.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24223/diff/
> 
> 
> Testing
> -------
> 
> Manually tested with a MySQL database. Unit tests are being developed yet.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message