sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue" <b...@apache.org>
Subject Re: Review Request 24223: SQOOP-1390: Import data to HDFS as a set of Parquet files
Date Tue, 12 Aug 2014 16:52:25 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24223/#review50337
-----------------------------------------------------------


Now that you've added support for append, I want to warn you that there's a slight change
to the behavior in the upcoming release. In 0.15.0, what you have works fine. In 0.16.0, we
are changing the writeTo method so that it checks to see that the target is empty and we've
added an "appendTo" alternative that does the append without that check. So when you update
to a newer release, you'll have to make that change.

- Ryan Blue


On Aug. 11, 2014, 11:30 p.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24223/
> -----------------------------------------------------------
> 
> (Updated Aug. 11, 2014, 11:30 p.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> The patch proposes to add the possibility to import an individual table from a RDBMS
into HDFS as a set of Parquet files. It also supports a command-line interface with a new
argument `--as-parquetfile`
> Example invocation: `sqoop import --connect JDBC_URI --table TABLE --as-parquetfile --target-dir
/path/to/files`
> 
> The major items are listed as follows:
> *Implement `ParquetImportMapper`
> *Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import job.
> *Support both import from scratch and in append mode
> 
> As Parquet is a columnar storage format, it doesn't make sense to write to it directly
from record-based tools. We've considered of using Kite SDK to simplify the handling of Parquet
specific things. The major idea is to convert `SqoopRecord` as `GenericRecord` and write them
into a Kite dataset. Kite SDK will convert these records to as a set of Parquet files.
> 
> 
> Diffs
> -----
> 
>   ivy.xml abc12a1 
>   ivy/libraries.properties a59471e 
>   src/docs/man/import-args.txt a4ce4ec 
>   src/docs/man/sqoop-import-all-tables.txt 6b639f5 
>   src/docs/user/hcatalog.txt cd1dde3 
>   src/docs/user/help.txt a9e1e89 
>   src/docs/user/import-all-tables.txt 60645f1 
>   src/docs/user/import.txt 192e97e 
>   src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc 
>   src/java/org/apache/sqoop/lib/SqoopGenericRecord.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/orm/ClassWriter.java 94ff576 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea 
>   src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d 
>   src/licenses/LICENSE-BIN.txt 4215d26 
>   src/test/com/cloudera/sqoop/TestParquetImport.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24223/diff/
> 
> 
> Testing
> -------
> 
> Included 4 test cases. All of them are executed successfully.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message