sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat Ranganathan" <n....@live.com>
Subject Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog
Date Tue, 21 May 2013 00:35:00 GMT


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > build.xml, line 54
> > <https://reviews.apache.org/r/10688/diff/5/?file=288026#file288026line54>
> >
> >     I'm not feeling entirely comfortable about depending on SNAPSHOTS. Is there
a particular feature that we're taking advantage of in 0.6.0 that is not in 0.5.0?

No, the functionality (from the contract point of view) is even compatible with 0.4.0 I think.
  I could not successfully resolve the maven repos for the earlier versions and hence I had
to switch to it.   I think now I tried to build and found that only 0.11.0 is available readily
at repos.maven.org.   That was the reason.  I will update and switch to 0.5.0 if that version
is available in the repos.   But given that we want to have readily available Hadoop 2 and
Hadoop 1 artifacts, we may have to set to 0.11.0 assuming that is the version the HCatalog
team decides to publish the repositories for.


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/SqoopOptions.java, line 160
> > <https://reviews.apache.org/r/10688/diff/5/?file=288029#file288029line160>
> >
> >     Out of curiosity what the "stanza" stands for?

Stanza means paragraph :)   We used this a lot earlier in my work to describe the SQL snippets
when ]we write essays to describe what we want from the database.   May be clause is a more
general DB term.


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/config/ConfigurationConstants.java, lines 66-67
> > <https://reviews.apache.org/r/10688/diff/5/?file=288030#file288030line66>
> >
> >     Does the new property make sense when it's valid only on Hadoop2 that actually
do not have any JobTracker address at all? We already had issues with that on Sqoop2 side
in SQOOP-1002.

I just wanted to consolidate all constants.  If we can remove it, that would be OK


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/config/ConfigurationConstants.java, lines 90-91
> > <https://reviews.apache.org/r/10688/diff/5/?file=288030#file288030line90>
> >
> >     Nit: Please put extra empty line between the property name and private constructor.

Thanks Will fix


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java, lines 377-379
> > <https://reviews.apache.org/r/10688/diff/5/?file=288052#file288052line377>
> >
> >     Nit: Incorrect indentation.

Thanks will fix


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java, lines 368-370
> > <https://reviews.apache.org/r/10688/diff/5/?file=288052#file288052line368>
> >
> >     Nit: Incorrect indentation.

Thanks  will fix


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 849
> > <https://reviews.apache.org/r/10688/diff/5/?file=288043#file288043line849>
> >
> >     I think that we also want to skip invoking the output committer in case of hadoop
2.

Good catch.  I will modify the version check to just say if it is Hadoop2

Thanks


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java, lines 76-77
> > <https://reviews.apache.org/r/10688/diff/5/?file=288039#file288039line76>
> >
> >     Nit: Those extra lines seems unnecessary here because the next row also have
type constant.

Will remove the extra lines - thanks


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/ExportJobBase.java, line 389
> > <https://reviews.apache.org/r/10688/diff/5/?file=288034#file288034line389>
> >
> >     This comment seems to be artifact from development. I would suggest to improve
the message and move it into "debug" state in case that we would like to have it around.

Will remove


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java, lines 68-74
> > <https://reviews.apache.org/r/10688/diff/5/?file=288033#file288033line68>
> >
> >     I'm thinking if having subclass of the DataDrivenImportJob for HCat specific
things that would override this and couple of other methods would be cleaner than having multiple
if-else statements. What do you think Venkat?

Yes,   Good point.   It makes sense (and that is what I started with), but since I had to
change the ExportJob in place, I used a similar scheme for consistency and followed what we
did for other storage formats like Avro.   Also, this will make sure that irrespective of
the changes to import job (on what it gets its feed from), we will be able to send it to HCat.


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/ExportJobBase.java, lines 202-204
> > <https://reviews.apache.org/r/10688/diff/5/?file=288034#file288034line202>
> >
> >     Similarly as in the import. Would having dedicated classes for HCatalog make
sense/would be cleaner that having one class for everything and having multiple if-else statements?

Good point Jarek.  Actually I had that implementation first - but then we will not be able
to support update/upsert and call by procedure would  need to be modified to handle the HCat
format.   Since we were using HCat more as storage format like Avro, I decided to implement
in place.  And followed similar logic for Imports as well


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/manager/ConnManager.java, line 197
> > <https://reviews.apache.org/r/10688/diff/5/?file=288032#file288032line197>
> >
> >     Is the timestamp mapped to String from similar reason as mentioned above with
SMALLINT?

Timestamp is currently not a supported datatype in HCat (even though Hive supports it).  
I will create a JIRA issue on HCat to support that now that HCatalog is a sub project of Hive.


> On May 20, 2013, 1:02 p.m., Jarek Cecho wrote:
> > ivy.xml, lines 185-193
> > <https://reviews.apache.org/r/10688/diff/5/?file=288027#file288027line185>
> >
> >     Shouldn't those two dependencies be transitively propagated from HCatalog/Hive?

I had an issue building without the explicit dependency listed - may be because the repos
were not having all the artifacts and the data nucleus was only available from datanucleus
repository.   I will try to remove the dependency and retry.   
Thanks


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review20756
-----------------------------------------------------------


On May 4, 2013, 11:46 p.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated May 4, 2013, 11:46 p.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this
feature, it is possible to import and export data between Sqoop and HCatalog tables.   The
document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between
enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 1c33fee 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 9417d57 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 10f0cb9 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogExport.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogImport.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test
various aspects of the integration.  A unit test to test the option management is also added.
  All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message