sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Cecho" <jar...@apache.org>
Subject Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog
Date Wed, 05 Jun 2013 21:26:36 GMT


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > Hi Venkat,
> > Thank you for incorporating my comments, greatly appreciated. I've took a deep look
again and I do have following additional comments:
> > 
> > 1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java
> > 
> > 2) It seems that using --create-hcatalog-table will create the table and exist Sqoop
without doing the import:
> > 
> > [root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop
--username sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
> > 13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line
is insecure. Consider using -P instead.
> > 13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming
resultset.
> > 13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
`text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
`text` AS t LIMIT 1
> > 13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
> > 13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
> > Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or
overrides a deprecated API.
> > Note: Recompile with -Xlint:deprecation for details.
> > 13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
> > 13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from
mysql.
> > 13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the
--direct
> > 13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific
fast path.
> > 13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull
(mysql)
> > 13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import
job
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details
for job
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail
if needed jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide
--hive-home option.  Setting HIVE_HOME  to /usr/lib/hive
> > 13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may
fail if needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or
provide --hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
> > 13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM
`text` AS t LIMIT 1
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected
: [id, txt]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map
:
> >         Names: [id, txt]
> >         Types : [4, 12]
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text
for import
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement:

> > 
> > create table default.text (
> >         id int,
> >         txt string)
> > stored as rcfile
> > 13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
> > Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
> > OK
> > Time taken: 25.121 seconds
> > [root@bousa-hcat ~]#
> > 
> >
> 
> Venkat Ranganathan wrote:
>     Sure, I can add it to that.
>     
>     --create-hcatalog-table -  It seems to work by chance - That is, after creating the
table a bunch of stuff is done that is not needed.   I will add additional checks there
> 
> Venkat Ranganathan wrote:
>     Sorry I misunderstood your observation - There is even a test case to test this.
  What I thought you said was just using --create-hcatalog-table also works like the --create-hive-table
option without hive import.   Let me recheck this.
>     
>     Thanks

Hi Venkat,
please accept my apology for the confusion and let me to explain a bit better. I've noticed
that when I'm using the parameter --create-hcatalog-table, the logger will get reconfigured
and there is not Sqoop log available after the table is created. Notice that there is no log
after the "Time taken...".


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java, lines 131-137
> > <https://reviews.apache.org/r/10688/diff/9/?file=299874#file299874line131>
> >
> >     This method seems to be required only for the debug message. Is it the only
reason or did I miss something?
> 
> Venkat Ranganathan wrote:
>     Yes, it is needed for debugging purpose when we want to know when the sub record
reader or main record reader are called

I see, thank you.


> On June 4, 2013, 11:15 p.m., Jarek Cecho wrote:
> > src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java, line 523
> > <https://reviews.apache.org/r/10688/diff/9/?file=299879#file299879line523>
> >
> >     It seems that at this point we are not reading the hive configuration files
but yet executing the in-process Hive CLI that will as a result not pick up the configuration
file and will use defaults that is not consistent with the executed mapreduce job that will
use the proper configuration files. As a result the table will be created in different metastore
then into which we are importing data.
> 
> Venkat Ranganathan wrote:
>      Hive and hcat configuration files and jars have to be in the classpath brought in
by hcat -classpath.   Do you think that is not always in the configuration?   When I update
the configure sqoop script, I will make sure the hive conf is added.

Yeah it seems that HCatalog 0.5.0 is not putting the hive configuration directory in the classpath
- at least in my environment.


- Jarek


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this
feature, it is possible to import and export data between Sqoop and HCatalog tables.   The
document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between
enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test
various aspects of the integration.  A unit test to test the option management is also added.
  All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message