sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: import from Oracle to Hive : 2 errors
Date Mon, 17 Jun 2013 16:54:39 GMT
Hi Jerome,
I've noticed that you've already overcome this issue, however let me explain what has happened
for purpose of future readers.

Oracle connector is overloading parameter --table for specifying both table name and owner
name. However as this is specific only to the Oracle connector, other Sqoop parts are still
expecting only a table name inside the --table parameter. For example the Hive import part
of Sqoop will interpret the parameter value in form OWNER.TABLE as DATABASE.TABLE for Hive
and in case that the DATABASE do not exists, it will fail. The workaround is to specify the
--hive-table parameter to set correct table name without the owner part.

Jarcec

P.S. - This and many others Sqoop use cases are very deeply described in upcoming Sqoop Cookbook:

http://shop.oreilly.com/product/0636920029519.do

On Mon, Jun 17, 2013 at 05:31:52PM +0200, Jérôme Verdier wrote:
> Hi Jarcec,
> 
> Thanks for your answer, you're always very helpful =)
> 
> i think that my Hive installation is OK, i can connect to hive server
> through my laptop using JDBC and Squirrel SQL.
> 
> Here are the hive logs :
> 
> 2013-06-17 17:16:40,452 WARN  conf.HiveConf (HiveConf.java:<clinit>(75)) -
> hive-site.xml not found on CLASSPATH
> 2013-06-17 17:20:51,228 WARN  conf.HiveConf (HiveConf.java:<clinit>(75)) -
> hive-site.xml not found on CLASSPATH
> 2013-06-17 17:20:53,296 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2013-06-17 17:20:53,296 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2013-06-17 17:20:53,297 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2013-06-17 17:20:53,297 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2013-06-17 17:20:53,297 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2013-06-17 17:20:53,297 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2013-06-17 17:20:58,645 ERROR exec.Task (SessionState.java:printError(401))
> - FAILED: Error in metadata: InvalidObjectException(message:There is no
> database named themis)
> org.apache.hadoop.hive.ql.metadata.HiveException:
> InvalidObjectException(message:There is no database named themis)
>     at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:576)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3698)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:253)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>     at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
>     at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>     at
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:445)
>     at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:455)
>     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:711)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: InvalidObjectException(message:There is no database named themis)
>     at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:1091)
>     at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:1070)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
>     at $Proxy8.create_table(Unknown Source)
>     at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:432)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
>     at $Proxy9.createTable(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:570)
>     ... 20 more
> 
> 2013-06-17 17:20:58,651 ERROR ql.Driver (SessionState.java:printError(401))
> - FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask
> 
> I try to import the oracle table VENTES_EAN located in the schema THEMIS
> ==> THEMIS.VENTES_EAN
> 
> but as we can see on the logs, Hive is thinking that i want to import
> VENTES_EAN into the THEMIS databases, but i have only one database :
> default.
> 
> is there a Hive configuration problem here ?
> 
> Thanks.
> 
> 
> 
> 
> 2013/6/17 Jarek Jarcec Cecho <jarcec@apache.org>
> 
> > Hi Jerome,
> > Hive import in Sqoop is done in two phases. The first phase will transfer
> > the data from your Oracle database to HDFS as would normal non hive import.
> > Subsequently in the second phase Sqoop will invoke Hive to perform LOAD
> > DATA statement to move imported data into Hive. In you first Sqoop
> > invocation the first step obviously finished correctly, however the second
> > phase has failed. This is the reason why the second Sqoop invocation is
> > failing as the intermediate directory between the two phases still exists.
> > You can unblock that by removing the directory using HDFS command, for
> > example:
> >
> >   hadoop dfs -rmr KPI.ENTITE
> >
> > The second phase seems to be failing for you on following exception:
> >
> > > java.lang.RuntimeException: Unable to instantiate
> >
> > I would therefore suggest to take a look into Hive logs
> > (/tmp/$USER/hive.log if I'm not mistaken) to see if there would be more
> > details about the instantiation failure. Could you also verify that your
> > Hive installation is configured correctly?
> >
> > Jarcec
> >
> > On Mon, Jun 17, 2013 at 03:46:28PM +0200, Jérôme Verdier wrote:
> > > Hi,
> > >
> > > I'm try to import various tables from Oracle to Hive using Sqoop, but, i
> > > have some errors that i don't understand.
> > >
> > > Here is my query :
> > >
> > > sqoop import --connect jdbc:oracle:thin:@my.db.server:1521/xx --username
> > > user --password password --create-hive-table --hive-import --table
> > > schema.table_xx
> > >
> > > the first error  is this one :
> > >
> > > Please set $HBASE_HOME to the root of your HBase installation.
> > > 13/06/17 15:36:40 WARN tool.BaseSqoopTool: Setting your password on the
> > > command-line is insecure. Consider using -P instead.
> > > 13/06/17 15:36:40 INFO tool.BaseSqoopTool: Using Hive-specific delimiters
> > > for output. You can override
> > > 13/06/17 15:36:40 INFO tool.BaseSqoopTool: delimiters with
> > > --fields-terminated-by, etc.
> > > 13/06/17 15:36:40 INFO manager.SqlManager: Using default fetchSize of
> > 1000
> > > 13/06/17 15:36:40 INFO tool.CodeGenTool: Beginning code generation
> > > 13/06/17 15:36:41 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:36:41 INFO manager.SqlManager: Executing SQL statement:
> > SELECT
> > > t.* FROM KPI.ENTITE t WHERE 1=0
> > > 13/06/17 15:36:41 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
> > > /usr/local/hadoop
> > > Note:
> > >
> > /tmp/sqoop-hduser/compile/85a6dcface4ca6ca28091ed383edce2e/KPI_ENTITE.java
> > > uses or overrides a deprecated API.
> > > Note: Recompile with -Xlint:deprecation for details.
> > > 13/06/17 15:36:42 INFO orm.CompilationManager: Writing jar file:
> > > /tmp/sqoop-hduser/compile/85a6dcface4ca6ca28091ed383edce2e/KPI.ENTITE.jar
> > > 13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:36:42 WARN manager.OracleManager: The table KPI.ENTITE
> > contains
> > > a multi-column primary key. Sqoop will default to the column CO_SOCIETE
> > > only for this job.
> > > 13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:36:42 WARN manager.OracleManager: The table KPI.ENTITE
> > contains
> > > a multi-column primary key. Sqoop will default to the column CO_SOCIETE
> > > only for this job.
> > > 13/06/17 15:36:42 INFO mapreduce.ImportJobBase: Beginning import of
> > > KPI.ENTITE
> > > 13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:36:44 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
> > > SELECT MIN(CO_SOCIETE), MAX(CO_SOCIETE) FROM KPI.ENTITE
> > > 13/06/17 15:36:44 INFO mapred.JobClient: Running job:
> > job_201306171456_0005
> > > 13/06/17 15:36:45 INFO mapred.JobClient:  map 0% reduce 0%
> > > 13/06/17 15:36:56 INFO mapred.JobClient:  map 25% reduce 0%
> > > 13/06/17 15:37:40 INFO mapred.JobClient:  map 50% reduce 0%
> > > 13/06/17 15:38:00 INFO mapred.JobClient:  map 75% reduce 0%
> > > 13/06/17 15:38:08 INFO mapred.JobClient:  map 100% reduce 0%
> > > 13/06/17 15:38:09 INFO mapred.JobClient: Job complete:
> > job_201306171456_0005
> > > 13/06/17 15:38:09 INFO mapred.JobClient: Counters: 18
> > > 13/06/17 15:38:09 INFO mapred.JobClient:   Job Counters
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=151932
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Total time spent by all
> > > reduces waiting after reserving slots (ms)=0
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Total time spent by all maps
> > > waiting after reserving slots (ms)=0
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Launched map tasks=4
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> > > 13/06/17 15:38:09 INFO mapred.JobClient:   File Output Format Counters
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Bytes Written=26648
> > > 13/06/17 15:38:09 INFO mapred.JobClient:   FileSystemCounters
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     HDFS_BYTES_READ=462
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=244596
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=26648
> > > 13/06/17 15:38:09 INFO mapred.JobClient:   File Input Format Counters
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Bytes Read=0
> > > 13/06/17 15:38:09 INFO mapred.JobClient:   Map-Reduce Framework
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Map input records=339
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Physical memory (bytes)
> > > snapshot=171716608
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Spilled Records=0
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     CPU time spent (ms)=3920
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Total committed heap usage
> > > (bytes)=65011712
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Virtual memory (bytes)
> > > snapshot=1492393984
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     Map output records=339
> > > 13/06/17 15:38:09 INFO mapred.JobClient:     SPLIT_RAW_BYTES=462
> > > 13/06/17 15:38:09 INFO mapreduce.ImportJobBase: Transferred 26,0234 KB in
> > > 86,6921 seconds (307,3869 bytes/sec)
> > > 13/06/17 15:38:09 INFO mapreduce.ImportJobBase: Retrieved 339 records.
> > > 13/06/17 15:38:09 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:38:09 INFO manager.SqlManager: Executing SQL statement:
> > SELECT
> > > t.* FROM KPI.ENTITE t WHERE 1=0
> > > 13/06/17 15:38:09 WARN hive.TableDefWriter: Column CO_SOCIETE had to be
> > > cast to a less precise type in Hive
> > > 13/06/17 15:38:09 INFO hive.HiveImport: Removing temporary files from
> > > import process: hdfs://localhost:54310/user/hduser/KPI.ENTITE/_logs
> > > 13/06/17 15:38:09 INFO hive.HiveImport: Loading uploaded data into Hive
> > > 13/06/17 15:38:11 INFO hive.HiveImport: WARNING:
> > > org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use
> > > org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> > > files.
> > > 13/06/17 15:38:12 INFO hive.HiveImport: Logging initialized using
> > > configuration in
> > >
> > jar:file:/usr/local/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
> > > 13/06/17 15:38:12 INFO hive.HiveImport: Hive history
> > > file=/tmp/hduser/hive_job_log_hduser_201306171538_49452696.txt
> > > 13/06/17 15:38:14 INFO hive.HiveImport: FAILED: Error in metadata:
> > > java.lang.RuntimeException: Unable to instantiate
> > > org.apache.hadoop.hive.metastore.HiveMetaStoreClient
> > > 13/06/17 15:38:14 INFO hive.HiveImport: FAILED: Execution Error, return
> > > code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
> > > 13/06/17 15:38:14 ERROR tool.ImportTool: Encountered IOException running
> > > import job: java.io.IOException: Hive exited with status 1
> > >         at
> > >
> > org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:364)
> > >         at
> > > org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:314)
> > >         at
> > org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:226)
> > >         at
> > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> > >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
> > >         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> > >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> > >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> > >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> > >         at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> > >
> > > I don't understand because the M/R job is completed, but after this, it
> > > give me an I/O error.
> > >
> > > when i try a SHOW TABLES on hive, i have no tables.
> > >
> > > but, when i retry the SQOOP script, i get this error :
> > >
> > > Warning: /usr/lib/hbase does not exist! HBase imports will fail.
> > > Please set $HBASE_HOME to the root of your HBase installation.
> > > 13/06/17 15:41:51 WARN tool.BaseSqoopTool: Setting your password on the
> > > command-line is insecure. Consider using -P instead.
> > > 13/06/17 15:41:51 INFO tool.BaseSqoopTool: Using Hive-specific delimiters
> > > for output. You can override
> > > 13/06/17 15:41:51 INFO tool.BaseSqoopTool: delimiters with
> > > --fields-terminated-by, etc.
> > > 13/06/17 15:41:51 INFO manager.SqlManager: Using default fetchSize of
> > 1000
> > > 13/06/17 15:41:51 INFO tool.CodeGenTool: Beginning code generation
> > > 13/06/17 15:42:15 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:42:15 INFO manager.SqlManager: Executing SQL statement:
> > SELECT
> > > t.* FROM KPI.ENTITE t WHERE 1=0
> > > 13/06/17 15:42:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
> > > /usr/local/hadoop
> > > Note:
> > >
> > /tmp/sqoop-hduser/compile/10cd05e9146a878654b1155df5be7765/KPI_ENTITE.java
> > > uses or overrides a deprecated API.
> > > Note: Recompile with -Xlint:deprecation for details.
> > > 13/06/17 15:42:16 INFO orm.CompilationManager: Writing jar file:
> > > /tmp/sqoop-hduser/compile/10cd05e9146a878654b1155df5be7765/KPI.ENTITE.jar
> > > 13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:42:16 WARN manager.OracleManager: The table KPI.ENTITE
> > contains
> > > a multi-column primary key. Sqoop will default to the column CO_SOCIETE
> > > only for this job.
> > > 13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:42:16 WARN manager.OracleManager: The table KPI.ENTITE
> > contains
> > > a multi-column primary key. Sqoop will default to the column CO_SOCIETE
> > > only for this job.
> > > 13/06/17 15:42:16 INFO mapreduce.ImportJobBase: Beginning import of
> > > KPI.ENTITE
> > > 13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to
> > GMT
> > > 13/06/17 15:42:17 INFO mapred.JobClient: Cleaning up the staging area
> > >
> > hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201306171456_0006
> > > 13/06/17 15:42:17 ERROR security.UserGroupInformation:
> > > PriviledgedActionException as:hduser
> > > cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output
> > directory
> > > KPI.ENTITE already exists
> > > 13/06/17 15:42:17 ERROR tool.ImportTool: Encountered IOException running
> > > import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
> > > directory KPI.ENTITE already exists
> > >         at
> > >
> > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
> > >         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:949)
> > >         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> > >         at java.security.AccessController.doPrivileged(Native Method)
> > >         at javax.security.auth.Subject.doAs(Subject.java:396)
> > >         at
> > >
> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> > >         at
> > > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> > >         at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
> > >         at
> > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
> > >         at
> > >
> > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:173)
> > >         at
> > > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:151)
> > >         at
> > >
> > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:221)
> > >         at
> > > org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:545)
> > >         at
> > >
> > org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:380)
> > >         at
> > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
> > >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
> > >         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> > >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> > >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> > >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:22
> > >
> > > The output explains that the output already exists.
> > >
> > > But, Hive command SHOW TABLES give me zero tables !
> > >
> > > Thanks for your help ;-)
> > >
> > >
> > > --
> > > Jérôme
> >
> 
> 
> 
> -- 
> *Jérôme VERDIER*
> 06.72.19.17.31
> verdier.jerome66@gmail.com

Mime
View raw message