sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Cecho" <jar...@apache.org>
Subject Re: Review Request: Sqoop2: Export sub directories option
Date Sun, 09 Jun 2013 15:39:38 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11588/#review21503
-----------------------------------------------------------


Hi Vasanth,
thank you very much for working on this! I do have couple of additional notes:

I do have a small concerns about the test case TestHdfsExtract that seems to be written in
a way that the context objects are not passed into the job conf. I think that this is artifact
from early days that we should fix in a nice manner as is very likely that we will have to
add more inputs in the future.


core/src/main/java/org/apache/sqoop/framework/configuration/InputForm.java
<https://reviews.apache.org/r/11588/#comment44503>

    It seems to me that using Enum for simple boolean input is unnecessary overhead. We are
adding support for Boolen model type in SQOOP-1076, but the patch is not yet committed. Do
you think that it would make sense to wait and use that instead of the Enum?



execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportPartitioner.java
<https://reviews.apache.org/r/11588/#comment44504>

    It seems that boolean is more appropriate data type for this variable.



execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportPartitioner.java
<https://reviews.apache.org/r/11588/#comment44694>

    It seems that the code here will firstly recursively traverse through the file system
to find all directories, store them in memory and then iterate over the stored directories
to get list of files. Wouldn't be faster and less memory intensive to do both passes at the
same time? E.g. traverse though the filesystem and build directly the list of files?



execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportPartitioner.java
<https://reviews.apache.org/r/11588/#comment44695>

    Similarly as above, would be simple and more effective to calculate the size of all files
directly rather than building list of all directories in memory first?


Jarcec

- Jarek Cecho


On June 4, 2013, 11:58 a.m., vasanthkumar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11588/
> -----------------------------------------------------------
> 
> (Updated June 4, 2013, 11:58 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Description
> -------
> 
> 
> Introducing new option for exporting the sub directories of the given input directory.
> 
> Added new input 
> Input configuration
> 
> Input directory: /input
> Recursive: 
>   0 : NO
>   1 : YES
> Choose: 0
> 
> 
> This addresses bug SQOOP-1063.
>     https://issues.apache.org/jira/browse/SQOOP-1063
> 
> 
> Diffs
> -----
> 
>   core/src/main/java/org/apache/sqoop/framework/configuration/InputForm.java d5cbeec

>   core/src/main/resources/framework-resources.properties cebc90e 
>   docs/src/site/sphinx/ClientAPI.rst 4f3fda6 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportPartitioner.java
b3590dc 
>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b3e6050

>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java
0e8c9f7 
> 
> Diff: https://reviews.apache.org/r/11588/diff/
> 
> 
> Testing
> -------
> 
> Done
> 
> 
> Thanks,
> 
> vasanthkumar
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message