sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szabolcs Vasas <vasas.szabo...@gmail.com>
Subject Re: Review Request 67628: Implement an alternative solution for Parquet reading and writing
Date Tue, 26 Jun 2018 09:15:15 GMT


> On June 25, 2018, 2:03 p.m., Fero Szabo wrote:
> > src/test/org/apache/sqoop/TestParquetExport.java
> > Line 69 (original), 66 (patched)
> > <https://reviews.apache.org/r/67628/diff/1-2/?file=2041366#file2041366line69>
> >
> >     If the intention here is to run the same test for every implementation, then
it might make sense to reference the enum class again:
> >     
> >     ParquetJobConfiguratorImplementation.values()
> >     
> >     .. and automagically have a test for future imlementations.

My idea here is to decouple the constants used in the tests and the constants/enum used in
the production code.
Let's say this goes to production and the users start to use the "kite" and the "hadoop" properties
but somehow the value of the enum changes this tests will fail and show that the interface
has changed. However if I use ParquetJobConfiguratorImplementation.values() the test will
succeed but the users can have some broken jobs.


> On June 25, 2018, 2:03 p.m., Fero Szabo wrote:
> > src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
> > Lines 104 (patched)
> > <https://reviews.apache.org/r/67628/diff/2/?file=2044420#file2044420line104>
> >
> >     Is null sometimes OK here?
> >     
> >     I think it will cause an NPE later down the call stack in org.apache.sqoop.avro.AvroUtil#getFileToTest
> >     
> >     Though NPEs are straightforward to fix, so up to you whether you want to throw
something here or not.

I have added some comments here, I hope it clarifies it a bit.


> On June 25, 2018, 2:03 p.m., Fero Szabo wrote:
> > src/java/org/apache/sqoop/tool/BaseSqoopTool.java
> > Line 1912 (original), 1920 (patched)
> > <https://reviews.apache.org/r/67628/diff/2/?file=2044429#file2044429line1920>
> >
> >     Where do optionValue and propertyValue come from?
> >     
> >     My guess is that option comes from a --flag, and property from a -D... property.
I guess that the user can somehow declare them in the site.xml as well. (?)
> >     
> >     Might be an issue:
> >     I see that one overrides the other. Why the duplication? Did you document the
precedence order?

Correct, option value comes from --parquet-configurator-implementation while the propertyValue
comes from -Dparquetjob.configurator.implementation or the site.xml. The -D value overrides
the value defined in the site.xml (this is how Hadoop works) and the convention in Sqoop is
that the --flag overrides the property value. I will raise a separate JIRA for all the documentation
tasks and mention this behavior there.


> On June 25, 2018, 2:03 p.m., Fero Szabo wrote:
> > src/java/org/apache/sqoop/tool/BaseSqoopTool.java
> > Lines 1935 (patched)
> > <https://reviews.apache.org/r/67628/diff/2/?file=2044429#file2044429line1935>
> >
> >     To me this looks like a small detail that can be forgotten to be updated if
/ when a new implementation is added. 
> >     
> >     To avoid it, you might consider using this for supperted values:
> >     Arrays.toString(ParquetJobConfiguratorImplementation.values()

Nice catch, fixed.


- Szabolcs


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67628/#review205279
-----------------------------------------------------------


On June 26, 2018, 9:15 a.m., Szabolcs Vasas wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67628/
> -----------------------------------------------------------
> 
> (Updated June 26, 2018, 9:15 a.m.)
> 
> 
> Review request for Sqoop.
> 
> 
> Bugs: SQOOP-3328
>     https://issues.apache.org/jira/browse/SQOOP-3328
> 
> 
> Repository: sqoop-trunk
> 
> 
> Description
> -------
> 
> The new implementation uses classes from parquet.hadoop packages.
> TestParquetIncrementalImportMerge has been introduced to cover some gaps we had in the
Parquet merge support.
> The test infrastructure is also modified a bit which was needed because of TestParquetIncrementalImportMerge.
> 
> Note that this JIRA does not cover the Hive Parquet import support I will create another
JIRA for that.
> 
> 
> Diffs
> -----
> 
>   src/java/org/apache/sqoop/SqoopOptions.java d9984af369f901c782b1a74294291819e7d13cdd

>   src/java/org/apache/sqoop/avro/AvroUtil.java 57c2062568778c5bb53cd4118ce4f030e4ff33f2

>   src/java/org/apache/sqoop/manager/ConnManager.java c80dd5d9cbaa9b114c12b693e9a686d2cbbe51a3

>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java 3b5421028d3006e790ed4b711a06dbdb4035b8a0

>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 17c9ed39b1e613a6df36b54cd5395b80e5f8fb0b

>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetConstants.java ae53a96bddc523a52384715dd97705dc3d9db607

>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetExportJobConfigurator.java 8d7b87f6d6832ce8d81d995af4c4bd5eeae38e1b

>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetImportJobConfigurator.java fa1bc7d1395fbbbceb3cb72802675aebfdb27898

>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactory.java ed5103f1d84540ef2fa5de60599e94aa69156abe

>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorFactoryProvider.java
2286a52030778925349ebb32c165ac062679ff71 
>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetJobConfiguratorImplementation.java
PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/parquet/ParquetMergeJobConfigurator.java 67fdf6602bcbc6c091e1e9bf4176e56658ce5222

>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportJobConfigurator.java
PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetExportMapper.java PRE-CREATION

>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportJobConfigurator.java
PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetImportMapper.java PRE-CREATION

>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetJobConfiguratorFactory.java
PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/parquet/hadoop/HadoopParquetMergeJobConfigurator.java
PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteMergeParquetReducer.java 7f21205e1c4be4200f7248d3f1c8513e0c8e490c

>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportJobConfigurator.java
ca02c7bdcaf2fa981e15a6a96b111dec38ba2b25 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetExportMapper.java 25555d88a9c8ea4eb32001e1eb03e636d9386719

>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportJobConfigurator.java
87828d1413eb71761aed44ad3b138535692f9c97 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetImportMapper.java 20adf6e422cc4b661a74c8def114d44a14787fc6

>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetJobConfiguratorFactory.java
055e1166b07aeef711cd162052791500368c628d 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetMergeJobConfigurator.java
9fecf282885f7aeac011a66f7d5d05512624976f 
>   src/java/org/apache/sqoop/mapreduce/parquet/kite/KiteParquetUtils.java e68bba90d8b08ac3978fcc9ccae612bdf02388e8

>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java c62ee98c2b22d819c9a994884b254f76eb518b6a

>   src/java/org/apache/sqoop/tool/ImportTool.java 2c474b7eeeff02b59204e4baca8554d668b6c61e

>   src/java/org/apache/sqoop/tool/MergeTool.java 4c20f7d151514b26a098dafdc1ee265cbde5ad20

>   src/test/org/apache/sqoop/TestBigDecimalExport.java ccea17345c0c8a2bdb7c8fd141f37e3c822ee41e

>   src/test/org/apache/sqoop/TestMerge.java 11806fea6c59ea897bc1aa23f6657ed172d093d5 
>   src/test/org/apache/sqoop/TestParquetExport.java 43dabb57b7862b607490369e09b197b6de65a147

>   src/test/org/apache/sqoop/TestParquetImport.java 27d407aa3f9f2781f675294fa98431bc46f3dcfa

>   src/test/org/apache/sqoop/TestParquetIncrementalImportMerge.java PRE-CREATION 
>   src/test/org/apache/sqoop/TestSqoopOptions.java bb7c20ddcb8fb5fc9c3b1edfb73fecb739bba269

>   src/test/org/apache/sqoop/hive/TestHiveServer2TextImport.java f6d591b73373fdf33b27202cb8116025fb694ef1

>   src/test/org/apache/sqoop/testutil/BaseSqoopTestCase.java a5f85a06ba21b01e99c1655450d36016c2901cc0

>   src/test/org/apache/sqoop/testutil/ImportJobTestCase.java dbefe209770885063d1b4d0c3940d078b8d91cad

>   src/test/org/apache/sqoop/tool/TestBaseSqoopTool.java 01ad15013d5181c94c390ba0e360e85460cb1983

>   src/test/org/apache/sqoop/util/ParquetReader.java 56e03a06094cd29a129b31ea8688b85e8d6f8f5f

> 
> 
> Diff: https://reviews.apache.org/r/67628/diff/3/
> 
> 
> Testing
> -------
> 
> Ran unit and third party tests successfully.
> 
> 
> Thanks,
> 
> Szabolcs Vasas
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message