sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zoltán Tóth <swank.vali...@gmail.com>
Subject Re: Review Request 61522: SQOOP-2907 : Export parquet files to RDBMS: don't require .metadata for parquet files
Date Mon, 02 Oct 2017 15:18:36 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/61522/#review186844
-----------------------------------------------------------



Hey Sandish,

Thanks for your contribution. I am pretty sure your work will solve a lot of headache for
a lot of people.

I'd some findings so I added my comments to the code. Please read them and put your comments/ideas
to it.

Cheers, Zoli


src/java/org/apache/sqoop/avro/AvroUtil.java
Lines 194 (patched)
<https://reviews.apache.org/r/61522/#comment263683>

    If it is only Parquet file related then what is this change in AvroUtils. I haven't checked
why it is necessary so please explain why does this change needed.



src/java/org/apache/sqoop/avro/AvroUtil.java
Lines 195 (patched)
<https://reviews.apache.org/r/61522/#comment263684>

    If this part is needed then you can use Java Charset to load UTF-8 so it won't throw IOException.



src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java
Lines 100 (patched)
<https://reviews.apache.org/r/61522/#comment263688>

    It is called here double time. Is it okay?



src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java
Lines 107 (patched)
<https://reviews.apache.org/r/61522/#comment263690>

    Why do you use double log.warn here? You can merge it to one.



src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java
Lines 111 (patched)
<https://reviews.apache.org/r/61522/#comment263689>

    Why don't use the catch block for this code snippet?



src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java
Lines 120 (patched)
<https://reviews.apache.org/r/61522/#comment263687>

    I think this shouldn't be public. If you use it only inside of this class then make it
private. If you would like to use it outside of this class then a getter would make more sense.
    
    It is also should be somewhere begining of the class because it is a static final variable.
You can intialize it in a static block.



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 165 (patched)
<https://reviews.apache.org/r/61522/#comment263678>

    This method is almost copy of the previous one. Please avoid duplicates and put the common
parts into one method.
    
    And please also change the comment of the method becuase that is the same but the two
method doing different thing.
    
    fileNum parameter is not used in the method, please remove it.



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 226 (patched)
<https://reviews.apache.org/r/61522/#comment263680>

    Please do not copy and paste code parts which is already in the class. It increases duplication
which makes harder to read and maintain later.
    
    testSupportedParquetTypes() method also contains this part of the code. Put them into
a different method like createParquetTestFileContent()



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 232 (patched)
<https://reviews.apache.org/r/61522/#comment263679>

    We don't have an official formatter yet but I think a 120 columns length row is better
than 80.



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 475 (patched)
<https://reviews.apache.org/r/61522/#comment263681>

    Another duplication



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 527 (patched)
<https://reviews.apache.org/r/61522/#comment263685>

    Please specify the expected exception here. Your test should have one good result only.
Pleas avoid unnecessary usage of general exceptions.
    
    Use expected exception instead of try - catch. It makes the code more readable.



src/test/com/cloudera/sqoop/TestParquetExport.java
Lines 614 (patched)
<https://reviews.apache.org/r/61522/#comment263686>

    Same here. Please specify the exception and use expected exception instead of try - catch
blocks.


- Zoltán Tóth


On Aug. 9, 2017, 10:53 a.m., Sandish Kumar HN wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/61522/
> -----------------------------------------------------------
> 
> (Updated Aug. 9, 2017, 10:53 a.m.)
> 
> 
> Review request for Sqoop and Anna Szonyi.
> 
> 
> Bugs: SQOOP-2907
>     https://issues.apache.org/jira/browse/SQOOP-2907
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> Kite currently requires .metadata.
> Parquet files have their own metadata stored along data files.
> It would be great for Export operation on parquet files to RDBMS not to require .metadata.
> We have most of the files created by Spark and Hive, and they don't create .metadata,
it only Kite that does.
> It makes sqoop export of parquet files usability very limited.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/avro/AvroUtil.java ee29f140 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 6f9afaf9 
>   src/test/com/cloudera/sqoop/TestParquetExport.java 680fd73b 
> 
> 
> Diff: https://reviews.apache.org/r/61522/diff/1/
> 
> 
> Testing
> -------
> 
> testSupportedParquetTypesForWithoutParquetMeta - done
> testNullableFieldWithoutParquetMeta - done
> testParquetRecordsNotSupportedWithoutParquetMeta -done
> testMissingDatabaseFieldsWithoutParquetMeta - done
> testMissingParquetFieldsWithoutParquetMeta - done
> 
> 
> Thanks,
> 
> Sandish Kumar HN
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message