sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venkat Ranganathan" <n....@live.com>
Subject Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog
Date Wed, 05 Jun 2013 00:09:07 GMT


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > Hi Venkat,
> > Thank you for incorporating my comments, greatly appreciated. I've took a deep look
again and I do have following additional comments:
> > 
> > 1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java
> > 
> > 2) It seems that using --create-hcatalog-table will create the table and exist Sqoop
without doing the import:
> > 
> > [root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop
--username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
> > 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line
is insecure. Consider using -P instead.
> > 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming
resultset.
> > 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
`text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
`text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
> > 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
> > Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or
overrides a deprecated API.
> > Note: Recompile with -Xlint:deprecation for details.
> > 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
> > 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from
mysql.
> > 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the
--direct
> > 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific
fast path.
> > 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull
(mysql)
> > 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import
job
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details
for job
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail
if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide
--hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may
fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or
provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
> > 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
`text` AS t LIMIT 1
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected
: [id, txt]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map
:
> >         Names: [id, txt]
> >         Types : [4, 12]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text
for import
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement:

> > 
> > create table default.text (
> >         id int,
> >         txt string)
> > stored as rcfile
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
> > Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
> > OK
> > Time taken: 25.121 seconds
> > [root@bousa-hcat ~]#
> > 
> >

Sure, I can add it to that.

--create-hcatalog-table -  It seems to work by chance - That is, after creating the table
a bunch of stuff is done that is not needed.   I will add additional checks there


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/docs/user/hcatalog.txt, lines 284-285
> > <https://reviews.apache.org/r/10688/diff/9/?file=299864#file299864line284>
> >
> >     This seem unnecessary, can we tweak the bash scripts to do this automatically
if the hcat command is present?

Good point.  Since I modified  the hive unit tests to function correctly in the presence of
real hive environment, this can be easily done.


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/ExportJobBase.java, line 95
> > <https://reviews.apache.org/r/10688/diff/9/?file=299870#file299870line95>
> >
> >     Nit: I think that this line can be also refactored to the parent class right?

Yes.   One thing to note is that by  moving the isHCatJob to the parent class we lost the
ability to mark it as final.   Let me rework it


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/ImportJobBase.java, line 85
> > <https://reviews.apache.org/r/10688/diff/9/?file=299871#file299871line85>
> >
> >     Nit: I think that this line can be also refactored to the parent class right?

Please see above


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java, lines 131-137
> > <https://reviews.apache.org/r/10688/diff/9/?file=299874#file299874line131>
> >
> >     This method seems to be required only for the debug message. Is it the only
reason or did I miss something?

Yes, it is needed for debugging purpose when we want to know when the sub record reader or
main record reader are called


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 237-241
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line237>
> >
> >     Nit: It seems that we are doing the options = opts; every in all cases so maybe
it would be worth putting this line before "if" statement?

Sure


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 249
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line249>
> >
> >     Nit: Shouldn't be default Hive home in SqoopOptions.getDefaultHiveHome()?

Yes.   The message needs fixing


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 257
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line257>
> >
> >     Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()?

Yes.  As above


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 491
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line491>
> >
> >     Both Hive and HBase are idempotent when creating tables, so It might make sense
to add "IF NOT EXISTS" in order to remain consistent.

Good point.  I think we will otherwise earlier, but for consistency I think we should do this.
  Will change


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 523
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line523>
> >
> >     It seems that at this point we are not reading the hive configuration files
but yet executing the in-process Hive CLI that will as a result not pick up the configuration
file and will use defaults that is not consistent with the executed mapreduce job that will
use the proper configuration files. As a result the table will be created in different metastore
then into which we are importing data.

 Hive and hcat configuration files and jars have to be in the classpath brought in by hcat
-classpath.   Do you think that is not always in the configuration?   When I update the configure
sqoop script, I will make sure the hive conf is added.


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 749-750
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line749>
> >
> >     Shouldn't we use here the SqoopOptions.getDefaultHiveHome()?

Yes.  WIll fix


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 871-875
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line871>
> >
> >     Nit: Those lines seems to be unused.

Good catch - earlier I had the ability to execute a command line but removed it in favor of
a simpler model.  Will remove it


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 876
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line876>
> >
> >     Can we write the file in temporary directory rather than in current working
directory? (that might not be writable).

Sure will change


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/docs/user/hcatalog.txt, line 160
> > <https://reviews.apache.org/r/10688/diff/9/?file=299864#file299864line160>
> >
> >     Can we add here information what will happen if the table already exists and
this parameter is specified?

Sure.   Will do.


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 898-899
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line898>
> >
> >     I would suggest to alter this to single line:
> >     
> >     LOG.error("Error writing HCatalog load-in script: ", ioe);
> >     
> >     That will also print the stack trace.

Sure will do


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, lines 906-907
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line906>
> >
> >     I would suggest to change this line to :
> >     
> >     LOG.warn("IOException closing stream to HCatalog script: ", ioe);
> >     
> >     That will also print out the stack trace.

Sure will do


- Venkat


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this
feature, it is possible to import and export data between Sqoop and HCatalog tables.   The
document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between
enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test
various aspects of the integration.  A unit test to test the option management is also added.
  All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message