sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Cecho" <jar...@apache.org>
Subject Re: Review Request 16812: SQOOP-777. Sqoop2: Pluggable Intermediate Data Format
Date Thu, 06 Feb 2014 00:25:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16812/#review33500
-----------------------------------------------------------


Hi Abe,
thank you very much for taking up this huge item. I've done quite a deep dive into the patch
and I do have several notes:


common/src/main/java/org/apache/sqoop/schema/type/Column.java
<https://reviews.apache.org/r/16812/#comment63262>

    This currently do not seems to be used, so perhaps we can address it in separate JIRA?



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
<https://reviews.apache.org/r/16812/#comment63303>

    This fragment seems to be very similar to the one that is available in ImportInitializer,
so I'm wondering if we can further abstract it to the Util class.



connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java
<https://reviews.apache.org/r/16812/#comment63261>

    Might I suggest to get this refactoring done in separate JIRA?
    
    I would also suggest to move this code to the connector SDK (with it's dependencies if
needed).



connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
<https://reviews.apache.org/r/16812/#comment62971>

    Nit: Let's remove the generated comment.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/AbstractIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63236>

    Nit: Since we are not much using the interfaces with abstract implementations we tend
not to use "Abstract" keyword in the class names.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/AbstractIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63237>

    This functionality seems to be here only for purpose of the child implementation (CSVIntermediateDataFormat),
so I'm wondering if we should not move it down to that class only.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/AbstractIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63238>

    This functionality seems to be here only for purpose of the child implementation (CSVIntermediateDataFormat),
so I'm wondering if we should not move it down to that class only.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/AbstractIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63239>

    Do not seems to be used anywhere. If it's just plan for the future, I would suggest to
do it in separate standalone JIRA.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63371>

    Nit: Do you think that US_ASCII would serve our purpose better?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63243>

    I'm trying to see why we are setting the "\0" for the escape character - are we trying
to disable the escape all together?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63244>

    It seems worth of throwing an exception.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63188>

    Why we are removing the first one?



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63245>

    It seems worth of throwing an exception.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63249>

    Do you think that we can take advantage of the CSV reader to "unescape" the string and
binary field for us? To skip some of the operations...



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63189>

    Converting Decimal to Double is not a valid operation. Decimal is a fixed point data type
whereas double is floating point. Hence they have different use case and behavior.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63190>

    Probably all right for the first implementation, but we should considering implementing
most of the supported data types eventually. Probably a good candidate for follow up JIRA.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63191>

    We are writing bytes, but reading UTF string - those two are not necessary the same, I
think that we should read the data same way we have written them down.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63192>

    Not sure that swallowing the exception is the best case, what about throwing SqoopException?
(it's an unchecked exception).



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63193>

    Please throw SqoopException instead.



connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
<https://reviews.apache.org/r/16812/#comment63194>

    Let's throw an exception here.



connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormatTest.java
<https://reviews.apache.org/r/16812/#comment63258>

    Can we convert this to the new format org.junit.Assert?



connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormatTest.java
<https://reviews.apache.org/r/16812/#comment63259>

    Rather then extending the TestCase, let's use the @Test annotation.



connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormatTest.java
<https://reviews.apache.org/r/16812/#comment63260>

    Nit: Unused imports.



connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormatTest.java
<https://reviews.apache.org/r/16812/#comment63250>

    I would suggest to add following tests:
    
    * Create byte array with all 256 possibilities and try the serialization/deserialization
    * Create String with all characters that we are usually replacing and try the serialization/deserialization



core/src/main/java/org/apache/sqoop/framework/JobManager.java
<https://reviews.apache.org/r/16812/#comment63291>

    Nit: This seems as unused import.



core/src/main/java/org/apache/sqoop/framework/JobManager.java
<https://reviews.apache.org/r/16812/#comment63228>

    Nit: Do we need to have the variable explicitly defined? Can we just do:
    
    request.setDataFormat(connector.getIntermediateDataFormat()).
    
    Nit nit: I would call the name of the method "setIntermediateDataFormat" so that it's
aligned with the "get" method.



core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java
<https://reviews.apache.org/r/16812/#comment63229>

    I would suggest to define the variable as "Class" and not convert it into String at all.



execution/mapreduce/pom.xml
<https://reviews.apache.org/r/16812/#comment63295>

    I believe that the log4j should not be required here as it's defined in the root pom.xml
file for all sub modules.



execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
<https://reviews.apache.org/r/16812/#comment63372>

    There seems to be a lot of duplicates between import and export prepare method. Do you
think that we could abstract shared behavior to single method? If so, I would suggest to address
it in separate JIRA.



execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java
<https://reviews.apache.org/r/16812/#comment63375>

    It seems that by this patch entire "Data" class becomes slightly obsolete, so I'm thinking
if it would make sense to remove it altogether?



execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java
<https://reviews.apache.org/r/16812/#comment63289>

    Nit: Unused import.



execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java
<https://reviews.apache.org/r/16812/#comment63376>

    Can we make a test for this class to verify that serialization of most troublesome combinations
will be alright? I'm worried that serializing the data as UTF might not work in case that
we want to store our binary format...



execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java
<https://reviews.apache.org/r/16812/#comment63374>

    This seems a bit troublesome to me to force the IDF extract the data into String where
we do not necessary have to.  What about using the IDF itself to call the write() and readFields()?
    
    I've looked into other Writables and it seems that if we will make the Writable also Configurable,
Hadoop will pass the config object for us before any serialization/deserialization. We should
try it out and see if that will help us avoid the unnecessary serialization. Check out GenericWritable
as an example:
    
    http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/GenericWritable.html



pom.xml
<https://reviews.apache.org/r/16812/#comment63294>

    The dependency section on the root pom.xml file is only for items that are required by
all modules. Hence the CSV dependency should rather go to dependencyManagement section and
then to corresponding submodule.



spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
<https://reviews.apache.org/r/16812/#comment63235>

    I would define this method as a concrete and return the CSVInputFormat by default.



submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java
<https://reviews.apache.org/r/16812/#comment63293>

    This seems to be a stand alone bug, would you mind addressing it in standalone JIRA?


Jarcec

- Jarek Cecho


On Jan. 30, 2014, 3:17 a.m., Abraham Elmahrek wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16812/
> -----------------------------------------------------------
> 
> (Updated Jan. 30, 2014, 3:17 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-777
>     https://issues.apache.org/jira/browse/SQOOP-777
> 
> 
> Repository: sqoop-sqoop2
> 
> 
> Description
> -------
> 
> Previous review from hari is in first iteration of this review only.
> 
> commit 1c5122611fbd4a46a629421f7c55746cfc14f136
> Author: Hari Shreedharan <hshreedharan@apache.org>
> Date:   Fri Jan 10 15:07:08 2014 -0800
> 
>     SQOOP-777. Sqoop2: Pluggable Intermediate Data Format
>     
>     Implemented a pluggable intermediate data format that decouples the internal representation
of the data from the connector and the output formats. Connectors can choose to implement
and support a format that is more efficient for them. Also separated the SqoopWritable so
th
>     
>     I ran a full build - all tests including integration tests pass. I have not added
any new tests, yet. I will add unit tests for the new classes. Also, I have not tried running
this on an actual cluster - so things may be broken. I'd like some initial feedback based
on the c
>     
>     I also implemented escaping of characters. There is some work remaining to support
binary format, but it is mostly integration, the basic implementation is in place.
> 
> :100644 100644 db11b5b... 9bfa07d... M  common/pom.xml
> :100644 100644 3e1adc7... f971240... M  common/src/main/java/org/apache/sqoop/etl/io/DataReader.java
> :100644 100644 d81364e... e547875... M  common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java
> :100644 100644 8b630b2... 30c26a3... M  common/src/main/java/org/apache/sqoop/schema/type/Column.java
> :100644 100644 e0da80f... 9c70db9... M  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java
> :100644 100644 ef39cdc... 075890f... M  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
> :100644 100644 96818ba... 795ffdb... M  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
> :000000 100644 0000000... 6c33423... A  connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java
> :100644 100644 aa1c4ff... 7220018... M  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
> :100644 100644 a7ed6ba... 28399f2... M  connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java
> :100644 100644 4056e14... f54837d... M  connector/connector-sdk/pom.xml
> :000000 100644 0000000... d4874f2... A  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/CSVIntermediateDataFormat.java
> :000000 100644 0000000... 63e14d2... A  connector/connector-sdk/src/main/java/org/apache/sqoop/connector/IntermediateDataFormat.java
> :000000 100644 0000000... 6e5479f... A  connector/connector-sdk/src/test/java/org/apache/sqoop/connector/CSVIntermediateDataFormatTest.java
> :100644 100644 e052584... 3b2ef94... M  core/src/main/java/org/apache/sqoop/framework/JobManager.java
> :100644 100644 53d0039... 9f5c47d... M  core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java
> :100644 100644 f9a2a0e... 9754afd... M  execution/mapreduce/pom.xml
> :100644 100644 5c0a027... 7680e33... M  execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
> :100644 100644 7fd9a01... 3604898... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java
> :100644 100644 1978ec6... 099cdd3... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java
> :100644 100644 a07c511... ee6bf39... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java
> :100644 100644 4621942... 7b799ca... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java
> :000000 100644 0000000... 71ae980... A  execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java
> :100644 100644 356ae8a... b495cc9... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java
> :100644 100644 92de37e... 8164ffe... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java
> :100644 100644 90de6ef... b3503da... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java
> :100644 100644 7dedee9... 16e59d8... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
> :100644 100644 98a2c51... 5aaceb3... M  execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java
> :100644 100644 e21f15b... 4e9e6ea... M  execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java
> :100644 100644 b7079dd... ddda423... M  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java
> :100644 100644 f849aae... 2968411... M  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java
> :100644 100644 7b264c6... 954990f... M  execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java
> :000000 100644 0000000... aea7de3... A  execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java
> :100644 100644 bee8ab7... 663dfb5... M  execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java
> :100644 100644 1e2f005... a722c74... M  pom.xml
> :100644 100644 0b240e8... 43f17d4... M  spi/pom.xml
> :100644 100644 2becc56... 298e8f5... M  spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java
> :100644 100644 6fc485b... 8d1a4e8... M  submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java
> 
> 
> Diffs
> -----
> 
>   common/pom.xml db11b5ba70093704bc3f7a0a98cf7d44172e2b7d 
>   common/src/main/java/org/apache/sqoop/etl/io/DataReader.java 3e1adc7084cf9ef1b93c8146110c751c7de34376

>   common/src/main/java/org/apache/sqoop/etl/io/DataWriter.java d81364ef81bc747437f5b78520bacec8ee2613a3

>   common/src/main/java/org/apache/sqoop/schema/type/Column.java 8b630b28295700b204da9f5a948eaef106ca559b

>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcConnector.java
e0da80f77230fb836ef6ab36e3bd867d6cef553a 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcExportInitializer.java
ef39cdcbc41280a0dec338c8c2ab27b7cf76385b 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/GenericJdbcImportInitializer.java
96818ba27b25538643f9c5777ed8abebdf0eb1e9 
>   connector/connector-generic-jdbc/src/main/java/org/apache/sqoop/connector/jdbc/util/InitializationUtils.java
PRE-CREATION 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestExportLoader.java
aa1c4ff7ff138de5efc5bc1bca5b0d869d214a1c 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportExtractor.java
a7ed6bab95c76c4928414e7850cc3e52a3ce4595 
>   connector/connector-generic-jdbc/src/test/java/org/apache/sqoop/connector/jdbc/TestImportInitializer.java
a33fa3631400c3382a72e6286b1efb3fca0c9cb9 
>   connector/connector-sdk/pom.xml 4056e14978cb5e679233e9fdde11d53f1ca2756b 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/AbstractIntermediateDataFormat.java
PRE-CREATION 
>   connector/connector-sdk/src/main/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormat.java
PRE-CREATION 
>   connector/connector-sdk/src/test/java/org/apache/sqoop/connector/idf/CSVIntermediateDataFormatTest.java
PRE-CREATION 
>   core/src/main/java/org/apache/sqoop/framework/JobManager.java e0525846029bb394b2614a33595ebe6fcdd4aaf2

>   core/src/main/java/org/apache/sqoop/framework/SubmissionRequest.java 53d003980a172e4e0acf18630a3496909c17cb5c

>   execution/mapreduce/pom.xml f9a2a0e57b499cc785bdb6548a2d6113e7b867c0 
>   execution/mapreduce/src/main/java/org/apache/sqoop/execution/mapreduce/MapreduceExecutionEngine.java
5c0a02739a5f7598dc83ab82631b80410ab39213 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/JobConstants.java 7fd9a017140fb91292a88e5a944ae549915c86e4

>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsExportExtractor.java
1978ec61e07beb6b1e75bc4da8c698c738ac4e4c 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsSequenceImportLoader.java
a07c5111e9e037dab4dacb128a53918f61599495 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/etl/HdfsTextImportLoader.java
462194238755e603dd60226a97cf357cf63e0f20 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/io/SqoopWritable.java PRE-CREATION

>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopFileOutputFormat.java
356ae8ab6f3cad13b644e1f033278d9012938956 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopMapper.java 92de37e1b7ffd3c4ce584e4d10c4e1f9af03c369

>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopNullOutputFormat.java
90de6efa5f3145d0480e375d82bd5c6a9760a799 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopOutputFormatLoadExecutor.java
7dedee9f12b11e3af5cf6ee5c81bbc377a04a4fd 
>   execution/mapreduce/src/main/java/org/apache/sqoop/job/mr/SqoopReducer.java 98a2c518b65130ebe272faf721b77099a1e85a63

>   execution/mapreduce/src/test/java/org/apache/sqoop/job/JobUtils.java e21f15b8be2aacb7463ab4c41cfce63cbb71cf57

>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsExtract.java b7079dd47c80c776a508b07f0ee5152c1d3fd20e

>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestHdfsLoad.java f849aaef10c3b995afbe6326353256affbceee67

>   execution/mapreduce/src/test/java/org/apache/sqoop/job/TestMapReduce.java 7b264c68ae761fbaf91bc51ed312de70b9ba3fd3

>   execution/mapreduce/src/test/java/org/apache/sqoop/job/io/SqoopWritableTest.java PRE-CREATION

>   execution/mapreduce/src/test/java/org/apache/sqoop/job/mr/TestSqoopOutputFormatLoadExecutor.java
bee8ab75f1f8e6508fca905aafe7c67f6b5d8d9a 
>   pom.xml 1e2f0057ca2e9cd635ccbce75e5d6e39eeabb163 
>   spi/pom.xml 0b240e8a1146cc7a06a2c50c181fb73b11e6dd8c 
>   spi/src/main/java/org/apache/sqoop/connector/spi/SqoopConnector.java 2becc56364343f1cffdf91d9b17144666acbdd2e

>   submission/mapreduce/src/main/java/org/apache/sqoop/submission/mapreduce/MapreduceSubmissionEngine.java
6fc485b66ea8c33d09d172675a104954d570f3a7 
> 
> Diff: https://reviews.apache.org/r/16812/diff/
> 
> 
> Testing
> -------
> 
> mvn test
> 
> 
> Thanks,
> 
> Abraham Elmahrek
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message