sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarek Cecho" <jar...@apache.org>
Subject Re: Review Request: SQOOP-931 - Integration of Sqoop and HCatalog
Date Tue, 04 Jun 2013 23:15:02 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10688/#review21420
-----------------------------------------------------------


Hi Venkat,
Thank you for incorporating my comments, greatly appreciated. I've took a deep look again
and I do have following additional comments:

1) Can we add the HCatalog tests into ThirdPartyTest suite? https://github.com/apache/sqoop/blob/trunk/src/test/com/cloudera/sqoop/ThirdPartyTests.java

2) It seems that using --create-hcatalog-table will create the table and exist Sqoop without
doing the import:

[root@bousa-hcat ~]# sqoop import --connect jdbc:mysql://mysql.ent.cloudera.com/sqoop --username
sqoop --password sqoop --table text --hcatalog-table text --create-hcatalog-table
13/06/04 15:44:39 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure.
Consider using -P instead.
13/06/04 15:44:39 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
13/06/04 15:44:39 INFO tool.CodeGenTool: Beginning code generation
13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text`
AS t LIMIT 1
13/06/04 15:44:39 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text`
AS t LIMIT 1
13/06/04 15:44:39 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
13/06/04 15:44:39 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.java uses or overrides
a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/06/04 15:44:42 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/f726ee2a04cf955e797a4932d94668f7/text.jar
13/06/04 15:44:42 WARN manager.MySQLManager: It looks like you are importing from mysql.
13/06/04 15:44:42 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
13/06/04 15:44:42 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
13/06/04 15:44:42 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull
(mysql)
13/06/04 15:44:42 INFO mapreduce.ImportJobBase: Beginning import of text
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog for import job
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Configuring HCatalog specific details for
job
13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: Hive home is not set. job may fail if needed
jar files are not found correctly.  Please set HIVE_HOME in sqoop-env.sh or provide --hive-home
option.  Setting HIVE_HOME  to /usr/lib/hive
13/06/04 15:44:42 WARN hcat.SqoopHCatUtilities: HCatalog home is not set. job may fail if
needed jar files are not found correctly.  Please set HCAT_HOME in sqoop-env.sh or provide
--hcatalog-home option.   Setting HCAT_HOME to /usr/lib/hcatalog
13/06/04 15:44:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `text`
AS t LIMIT 1
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column names projected : [id, txt]
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Database column name - type map :
        Names: [id, txt]
        Types : [4, 12]
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Creating HCatalog table default.text for import
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: HCatalog Create table statement: 

create table default.text (
        id int,
        txt string)
stored as rcfile
13/06/04 15:44:42 INFO hcat.SqoopHCatUtilities: Executing HCatalog CLI in-process.
Hive history file=/tmp/root/hive_job_log_65f4f145-0b1e-4e09-8e40-b7edcfc15f83_2077084453.txt
OK
Time taken: 25.121 seconds
[root@bousa-hcat ~]#




src/docs/user/hcatalog.txt
<https://reviews.apache.org/r/10688/#comment44346>

    Can we add here information what will happen if the table already exists and this parameter
is specified?



src/docs/user/hcatalog.txt
<https://reviews.apache.org/r/10688/#comment44347>

    This seem unnecessary, can we tweak the bash scripts to do this automatically if the hcat
command is present?



src/java/org/apache/sqoop/mapreduce/ExportJobBase.java
<https://reviews.apache.org/r/10688/#comment44350>

    Nit: I think that this line can be also refactored to the parent class right?



src/java/org/apache/sqoop/mapreduce/ImportJobBase.java
<https://reviews.apache.org/r/10688/#comment44349>

    Nit: I think that this line can be also refactored to the parent class right?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java
<https://reviews.apache.org/r/10688/#comment44351>

    This method seems to be required only for the debug message. Is it the only reason or
did I miss something?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44369>

    Nit: It seems that we are doing the options = opts; every in all cases so maybe it would
be worth putting this line before "if" statement?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44371>

    Nit: Shouldn't be default Hive home in SqoopOptions.getDefaultHiveHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44372>

    Shouldn't be default Hive home in SqoopOptions.getDefaultHcatHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44397>

    Both Hive and HBase are idempotent when creating tables, so It might make sense to add
"IF NOT EXISTS" in order to remain consistent.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44395>

    It seems that at this point we are not reading the hive configuration files but yet executing
the in-process Hive CLI that will as a result not pick up the configuration file and will
use defaults that is not consistent with the executed mapreduce job that will use the proper
configuration files. As a result the table will be created in different metastore then into
which we are importing data.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44378>

    Shouldn't we use here the SqoopOptions.getDefaultHiveHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44379>

    Shouldn't we use here the SqoopOptions.getDefaultHCatHome()?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44382>

    Nit: Considering that there might be Hadoop3 in the future, would it be simple to change
the condition to (isLocalMode and isHadoop1) instead of enumerating all other possible hadoop
versions?



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44386>

    Nit: Those lines seems to be unused.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44387>

    Can we write the file in temporary directory rather than in current working directory?
(that might not be writable).



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44388>

    I would suggest to alter this to single line:
    
    LOG.error("Error writing HCatalog load-in script: ", ioe);
    
    That will also print the stack trace.



src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java
<https://reviews.apache.org/r/10688/#comment44389>

    I would suggest to change this line to :
    
    LOG.warn("IOException closing stream to HCatalog script: ", ioe);
    
    That will also print out the stack trace.


Jarcec

- Jarek Cecho


On June 3, 2013, 4:16 a.m., Venkat Ranganathan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10688/
> -----------------------------------------------------------
> 
> (Updated June 3, 2013, 4:16 a.m.)
> 
> 
> Review request for Sqoop and Jarek Cecho.
> 
> 
> Description
> -------
> 
> This patch implements the new feature of integrating HCatalog and Sqoop.   With this
feature, it is possible to import and export data between Sqoop and HCatalog tables.   The
document attached to SQOOP-931 JIRA issue discusses the high level appraches.  
> 
> With this integration, more fidelity can be brought to the process of moving data between
enterprise data stores and hadoop ecosystem.
> 
> 
> Diffs
> -----
> 
>   build.xml 636c103 
>   ivy.xml 1fa4dd1 
>   ivy/ivysettings.xml c4cc561 
>   src/docs/user/SqoopUserGuide.txt 01ac1cf 
>   src/docs/user/hcatalog.txt PRE-CREATION 
>   src/java/org/apache/sqoop/SqoopOptions.java f18d43e 
>   src/java/org/apache/sqoop/config/ConfigurationConstants.java 5354063 
>   src/java/org/apache/sqoop/hive/HiveImport.java 838f083 
>   src/java/org/apache/sqoop/manager/ConnManager.java a1ac38e 
>   src/java/org/apache/sqoop/mapreduce/DataDrivenImportJob.java ef1d363 
>   src/java/org/apache/sqoop/mapreduce/ExportJobBase.java 1065d0b 
>   src/java/org/apache/sqoop/mapreduce/ImportJobBase.java 2465f3f 
>   src/java/org/apache/sqoop/mapreduce/JdbcExportJob.java 20636a0 
>   src/java/org/apache/sqoop/mapreduce/JobBase.java 0df1156 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportFormat.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatExportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatImportMapper.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatInputSplit.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatRecordReader.java PRE-CREATION 
>   src/java/org/apache/sqoop/mapreduce/hcat/SqoopHCatUtilities.java PRE-CREATION 
>   src/java/org/apache/sqoop/tool/BaseSqoopTool.java 42f521f 
>   src/java/org/apache/sqoop/tool/CodeGenTool.java dd34a97 
>   src/java/org/apache/sqoop/tool/ExportTool.java 215addd 
>   src/java/org/apache/sqoop/tool/ImportTool.java 2627726 
>   src/perftest/ExportStressTest.java 0a41408 
>   src/test/com/cloudera/sqoop/hive/TestHiveImport.java 462ccf1 
>   src/test/com/cloudera/sqoop/testutil/BaseSqoopTestCase.java cf41b96 
>   src/test/com/cloudera/sqoop/testutil/ExportJobTestCase.java e13f3df 
>   src/test/org/apache/sqoop/hcat/HCatalogExportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogImportTest.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/HCatalogTestUtils.java PRE-CREATION 
>   src/test/org/apache/sqoop/hcat/TestHCatalogBasic.java PRE-CREATION 
>   testdata/hcatalog/conf/hive-log4j.properties PRE-CREATION 
>   testdata/hcatalog/conf/hive-site.xml PRE-CREATION 
>   testdata/hcatalog/conf/log4j.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/10688/diff/
> 
> 
> Testing
> -------
> 
> Two new integration test suites with more than 20 tests in total have been added to test
various aspects of the integration.  A unit test to test the option management is also added.
  All tests pass
> 
> 
> Thanks,
> 
> Venkat Ranganathan
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message