sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Cecho" <jar...@apache.org>
Subject Re: Review Request 24223: SQOOP-1390: Import data to HDFS as a set of Parquet files
Date Mon, 18 Aug 2014 16:45:20 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24223/#review50881
-----------------------------------------------------------


I do have couple of comments, but otherwise I feel that we are in a shape to commit this feature!
Please do upload updated patch to both review board and JIRA. I can't commit a change that
is not on JIRA due to licensing issues.


src/java/org/apache/sqoop/lib/SqoopGenericRecord.java
<https://reviews.apache.org/r/24223/#comment88751>

    Nit: Unused import.



src/java/org/apache/sqoop/lib/SqoopGenericRecord.java
<https://reviews.apache.org/r/24223/#comment88756>

    Since this is more a Avro specific record rather then Sqoop "Generic", might I suggest
to rename this class to correspond with that? Something like "SqoopAvroGenericRecord" or "SqoopAvroRecord"?



src/java/org/apache/sqoop/lib/SqoopGenericRecord.java
<https://reviews.apache.org/r/24223/#comment88758>

    This seems as a copy&paste from AvroImportMapper class - let's perhaps put it to some
AvroUtils and reuse the same code?



src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java
<https://reviews.apache.org/r/24223/#comment88750>

    Nit: I believe that Hadoop prefers using NullWritable instead of Void.



src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java
<https://reviews.apache.org/r/24223/#comment88752>

    I can see that we are using this pattern in other mappers as well, could you please create
follow up JIRA to create the temporary directory inside task working dir rather then in tmp?
    
    I think that we should fix that across the code base, so it doesn't seem right to do this
in this particular JIRA :)



src/java/org/apache/sqoop/mapreduce/ParquetJob.java
<https://reviews.apache.org/r/24223/#comment88753>

    Nit: Please don't use "asterisks" imports, we prefer iterating over all required imports.



src/java/org/apache/sqoop/orm/ClassWriter.java
<https://reviews.apache.org/r/24223/#comment88757>

    I would rename this class in similar manner as the class SqoopGenericRecord itself.


Jarcec

- Jarek Cecho


On Aug. 14, 2014, 8:24 a.m., Qian Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24223/
> -----------------------------------------------------------
> 
> (Updated Aug. 14, 2014, 8:24 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> The patch proposes to add the possibility to import an individual table from a RDBMS
into HDFS as a set of Parquet files. It also supports a command-line interface with a new
argument `--as-parquetfile`
> Example invocation: `sqoop import --connect JDBC_URI --table TABLE --as-parquetfile --target-dir
/path/to/files`
> 
> The major items are listed as follows:
> *Implement `ParquetImportMapper`
> *Hook up the `ParquetOutputFormat` and `ParquetImportMapper` in the import job.
> *Support both import from scratch and in append mode
> 
> As Parquet is a columnar storage format, it doesn't make sense to write to it directly
from record-based tools. We've considered of using Kite SDK to simplify the handling of Parquet
specific things. The major idea is to convert `SqoopRecord` as `GenericRecord` and write them
into a Kite dataset. Kite SDK will convert these records to as a set of Parquet files.
> 
> 
> Diffs
> -----
> 
>   ivy.xml abc12a1 
>   ivy/libraries.properties a59471e 
>   src/docs/man/import-args.txt a4ce4ec 
>   src/docs/man/sqoop-import-all-tables.txt 6b639f5 
>   src/docs/user/hcatalog.txt cd1dde3 
>   src/docs/user/help.txt a9e1e89 
>   src/docs/user/import-all-tables.txt 60645f1 
>   src/docs/user/import.txt 192e97e 
>   src/java/com/cloudera/sqoop/SqoopOptions.java ffec2dc 
>   src/java/org/apache/sqoop/lib/SqoopGenericRecord.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 6dcfebb 
>   src/java/org/apache/sqoop/mapreduce/ParquetImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/ParquetJob.java PRE-CREATION 
>   src/java/org/apache/sqoop/orm/ClassWriter.java 94ff576 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java b77b1ea 
>   src/java/org/apache/sqoop/tool/ImportTool.java a3a2d0d 
>   src/java/org/apache/sqoop/util/AppendUtils.java 5eaaa95 
>   src/licenses/LICENSE-BIN.txt 4215d26 
>   src/test/com/cloudera/sqoop/TestParquetImport.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/24223/diff/
> 
> 
> Testing
> -------
> 
> Included 4 test cases. All of them are executed successfully.
> 
> 
> Thanks,
> 
> Qian Xu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message