sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Verdier <verdier.jerom...@gmail.com>
Subject Re: import from Oracle to Hive : 2 errors
Date Mon, 17 Jun 2013 15:59:45 GMT
I have the solution to my problem.

I resolved it by using the option --hive-table default.VENTES_EAN in my
sqoop script.

But, i have another question :

i want to import from oracle to hive only 1000 rows from my table, how we
can do this using sqoop ?

Thanks.

--
Jérôme


2013/6/17 Jérôme Verdier <verdier.jerome66@gmail.com>

> Hi Jarcec,
>
> Thanks for your answer, you're always very helpful =)
>
> i think that my Hive installation is OK, i can connect to hive server
> through my laptop using JDBC and Squirrel SQL.
>
> Here are the hive logs :
>
> 2013-06-17 17:16:40,452 WARN  conf.HiveConf (HiveConf.java:<clinit>(75)) -
> hive-site.xml not found on CLASSPATH
> 2013-06-17 17:20:51,228 WARN  conf.HiveConf (HiveConf.java:<clinit>(75)) -
> hive-site.xml not found on CLASSPATH
> 2013-06-17 17:20:53,296 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2013-06-17 17:20:53,296 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.resources" but it cannot be resolved.
> 2013-06-17 17:20:53,297 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2013-06-17 17:20:53,297 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.core.runtime" but it cannot be resolved.
> 2013-06-17 17:20:53,297 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2013-06-17 17:20:53,297 ERROR DataNucleus.Plugin
> (Log4JLogger.java:error(115)) - Bundle "org.eclipse.jdt.core" requires
> "org.eclipse.text" but it cannot be resolved.
> 2013-06-17 17:20:58,645 ERROR exec.Task
> (SessionState.java:printError(401)) - FAILED: Error in metadata:
> InvalidObjectException(message:There is no database named themis)
> org.apache.hadoop.hive.ql.metadata.HiveException:
> InvalidObjectException(message:There is no database named themis)
>     at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:576)
>     at
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3698)
>     at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:253)
>     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>     at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
>     at
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
>     at
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:445)
>     at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:455)
>     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:711)
>     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: InvalidObjectException(message:There is no database named
> themis)
>     at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:1091)
>     at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table(HiveMetaStore.java:1070)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
>     at $Proxy8.create_table(Unknown Source)
>     at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:432)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
>     at $Proxy9.createTable(Unknown Source)
>     at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:570)
>     ... 20 more
>
> 2013-06-17 17:20:58,651 ERROR ql.Driver
> (SessionState.java:printError(401)) - FAILED: Execution Error, return code
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask
>
> I try to import the oracle table VENTES_EAN located in the schema THEMIS
> ==> THEMIS.VENTES_EAN
>
> but as we can see on the logs, Hive is thinking that i want to import
> VENTES_EAN into the THEMIS databases, but i have only one database :
> default.
>
> is there a Hive configuration problem here ?
>
> Thanks.
>
>
>
>
> 2013/6/17 Jarek Jarcec Cecho <jarcec@apache.org>
>
>> Hi Jerome,
>> Hive import in Sqoop is done in two phases. The first phase will transfer
>> the data from your Oracle database to HDFS as would normal non hive import.
>> Subsequently in the second phase Sqoop will invoke Hive to perform LOAD
>> DATA statement to move imported data into Hive. In you first Sqoop
>> invocation the first step obviously finished correctly, however the second
>> phase has failed. This is the reason why the second Sqoop invocation is
>> failing as the intermediate directory between the two phases still exists.
>> You can unblock that by removing the directory using HDFS command, for
>> example:
>>
>>   hadoop dfs -rmr KPI.ENTITE
>>
>> The second phase seems to be failing for you on following exception:
>>
>> > java.lang.RuntimeException: Unable to instantiate
>>
>> I would therefore suggest to take a look into Hive logs
>> (/tmp/$USER/hive.log if I'm not mistaken) to see if there would be more
>> details about the instantiation failure. Could you also verify that your
>> Hive installation is configured correctly?
>>
>> Jarcec
>>
>> On Mon, Jun 17, 2013 at 03:46:28PM +0200, Jérôme Verdier wrote:
>> > Hi,
>> >
>> > I'm try to import various tables from Oracle to Hive using Sqoop, but, i
>> > have some errors that i don't understand.
>> >
>> > Here is my query :
>> >
>> > sqoop import --connect jdbc:oracle:thin:@my.db.server:1521/xx
>> --username
>> > user --password password --create-hive-table --hive-import --table
>> > schema.table_xx
>> >
>> > the first error  is this one :
>> >
>> > Please set $HBASE_HOME to the root of your HBase installation.
>> > 13/06/17 15:36:40 WARN tool.BaseSqoopTool: Setting your password on the
>> > command-line is insecure. Consider using -P instead.
>> > 13/06/17 15:36:40 INFO tool.BaseSqoopTool: Using Hive-specific
>> delimiters
>> > for output. You can override
>> > 13/06/17 15:36:40 INFO tool.BaseSqoopTool: delimiters with
>> > --fields-terminated-by, etc.
>> > 13/06/17 15:36:40 INFO manager.SqlManager: Using default fetchSize of
>> 1000
>> > 13/06/17 15:36:40 INFO tool.CodeGenTool: Beginning code generation
>> > 13/06/17 15:36:41 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:36:41 INFO manager.SqlManager: Executing SQL statement:
>> SELECT
>> > t.* FROM KPI.ENTITE t WHERE 1=0
>> > 13/06/17 15:36:41 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>> > /usr/local/hadoop
>> > Note:
>> >
>> /tmp/sqoop-hduser/compile/85a6dcface4ca6ca28091ed383edce2e/KPI_ENTITE.java
>> > uses or overrides a deprecated API.
>> > Note: Recompile with -Xlint:deprecation for details.
>> > 13/06/17 15:36:42 INFO orm.CompilationManager: Writing jar file:
>> >
>> /tmp/sqoop-hduser/compile/85a6dcface4ca6ca28091ed383edce2e/KPI.ENTITE.jar
>> > 13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:36:42 WARN manager.OracleManager: The table KPI.ENTITE
>> contains
>> > a multi-column primary key. Sqoop will default to the column CO_SOCIETE
>> > only for this job.
>> > 13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:36:42 WARN manager.OracleManager: The table KPI.ENTITE
>> contains
>> > a multi-column primary key. Sqoop will default to the column CO_SOCIETE
>> > only for this job.
>> > 13/06/17 15:36:42 INFO mapreduce.ImportJobBase: Beginning import of
>> > KPI.ENTITE
>> > 13/06/17 15:36:42 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:36:44 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
>> > SELECT MIN(CO_SOCIETE), MAX(CO_SOCIETE) FROM KPI.ENTITE
>> > 13/06/17 15:36:44 INFO mapred.JobClient: Running job:
>> job_201306171456_0005
>> > 13/06/17 15:36:45 INFO mapred.JobClient:  map 0% reduce 0%
>> > 13/06/17 15:36:56 INFO mapred.JobClient:  map 25% reduce 0%
>> > 13/06/17 15:37:40 INFO mapred.JobClient:  map 50% reduce 0%
>> > 13/06/17 15:38:00 INFO mapred.JobClient:  map 75% reduce 0%
>> > 13/06/17 15:38:08 INFO mapred.JobClient:  map 100% reduce 0%
>> > 13/06/17 15:38:09 INFO mapred.JobClient: Job complete:
>> job_201306171456_0005
>> > 13/06/17 15:38:09 INFO mapred.JobClient: Counters: 18
>> > 13/06/17 15:38:09 INFO mapred.JobClient:   Job Counters
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=151932
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Total time spent by all
>> > reduces waiting after reserving slots (ms)=0
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Total time spent by all
>> maps
>> > waiting after reserving slots (ms)=0
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Launched map tasks=4
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
>> > 13/06/17 15:38:09 INFO mapred.JobClient:   File Output Format Counters
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Bytes Written=26648
>> > 13/06/17 15:38:09 INFO mapred.JobClient:   FileSystemCounters
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     HDFS_BYTES_READ=462
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=244596
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=26648
>> > 13/06/17 15:38:09 INFO mapred.JobClient:   File Input Format Counters
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Bytes Read=0
>> > 13/06/17 15:38:09 INFO mapred.JobClient:   Map-Reduce Framework
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Map input records=339
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Physical memory (bytes)
>> > snapshot=171716608
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Spilled Records=0
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     CPU time spent (ms)=3920
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Total committed heap usage
>> > (bytes)=65011712
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Virtual memory (bytes)
>> > snapshot=1492393984
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     Map output records=339
>> > 13/06/17 15:38:09 INFO mapred.JobClient:     SPLIT_RAW_BYTES=462
>> > 13/06/17 15:38:09 INFO mapreduce.ImportJobBase: Transferred 26,0234 KB
>> in
>> > 86,6921 seconds (307,3869 bytes/sec)
>> > 13/06/17 15:38:09 INFO mapreduce.ImportJobBase: Retrieved 339 records.
>> > 13/06/17 15:38:09 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:38:09 INFO manager.SqlManager: Executing SQL statement:
>> SELECT
>> > t.* FROM KPI.ENTITE t WHERE 1=0
>> > 13/06/17 15:38:09 WARN hive.TableDefWriter: Column CO_SOCIETE had to be
>> > cast to a less precise type in Hive
>> > 13/06/17 15:38:09 INFO hive.HiveImport: Removing temporary files from
>> > import process: hdfs://localhost:54310/user/hduser/KPI.ENTITE/_logs
>> > 13/06/17 15:38:09 INFO hive.HiveImport: Loading uploaded data into Hive
>> > 13/06/17 15:38:11 INFO hive.HiveImport: WARNING:
>> > org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use
>> > org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
>> > files.
>> > 13/06/17 15:38:12 INFO hive.HiveImport: Logging initialized using
>> > configuration in
>> >
>> jar:file:/usr/local/hive/lib/hive-common-0.10.0.jar!/hive-log4j.properties
>> > 13/06/17 15:38:12 INFO hive.HiveImport: Hive history
>> > file=/tmp/hduser/hive_job_log_hduser_201306171538_49452696.txt
>> > 13/06/17 15:38:14 INFO hive.HiveImport: FAILED: Error in metadata:
>> > java.lang.RuntimeException: Unable to instantiate
>> > org.apache.hadoop.hive.metastore.HiveMetaStoreClient
>> > 13/06/17 15:38:14 INFO hive.HiveImport: FAILED: Execution Error, return
>> > code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
>> > 13/06/17 15:38:14 ERROR tool.ImportTool: Encountered IOException running
>> > import job: java.io.IOException: Hive exited with status 1
>> >         at
>> >
>> org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:364)
>> >         at
>> > org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:314)
>> >         at
>> org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:226)
>> >         at
>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
>> >
>> > I don't understand because the M/R job is completed, but after this, it
>> > give me an I/O error.
>> >
>> > when i try a SHOW TABLES on hive, i have no tables.
>> >
>> > but, when i retry the SQOOP script, i get this error :
>> >
>> > Warning: /usr/lib/hbase does not exist! HBase imports will fail.
>> > Please set $HBASE_HOME to the root of your HBase installation.
>> > 13/06/17 15:41:51 WARN tool.BaseSqoopTool: Setting your password on the
>> > command-line is insecure. Consider using -P instead.
>> > 13/06/17 15:41:51 INFO tool.BaseSqoopTool: Using Hive-specific
>> delimiters
>> > for output. You can override
>> > 13/06/17 15:41:51 INFO tool.BaseSqoopTool: delimiters with
>> > --fields-terminated-by, etc.
>> > 13/06/17 15:41:51 INFO manager.SqlManager: Using default fetchSize of
>> 1000
>> > 13/06/17 15:41:51 INFO tool.CodeGenTool: Beginning code generation
>> > 13/06/17 15:42:15 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:42:15 INFO manager.SqlManager: Executing SQL statement:
>> SELECT
>> > t.* FROM KPI.ENTITE t WHERE 1=0
>> > 13/06/17 15:42:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
>> > /usr/local/hadoop
>> > Note:
>> >
>> /tmp/sqoop-hduser/compile/10cd05e9146a878654b1155df5be7765/KPI_ENTITE.java
>> > uses or overrides a deprecated API.
>> > Note: Recompile with -Xlint:deprecation for details.
>> > 13/06/17 15:42:16 INFO orm.CompilationManager: Writing jar file:
>> >
>> /tmp/sqoop-hduser/compile/10cd05e9146a878654b1155df5be7765/KPI.ENTITE.jar
>> > 13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:42:16 WARN manager.OracleManager: The table KPI.ENTITE
>> contains
>> > a multi-column primary key. Sqoop will default to the column CO_SOCIETE
>> > only for this job.
>> > 13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:42:16 WARN manager.OracleManager: The table KPI.ENTITE
>> contains
>> > a multi-column primary key. Sqoop will default to the column CO_SOCIETE
>> > only for this job.
>> > 13/06/17 15:42:16 INFO mapreduce.ImportJobBase: Beginning import of
>> > KPI.ENTITE
>> > 13/06/17 15:42:16 INFO manager.OracleManager: Time zone has been set to
>> GMT
>> > 13/06/17 15:42:17 INFO mapred.JobClient: Cleaning up the staging area
>> >
>> hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201306171456_0006
>> > 13/06/17 15:42:17 ERROR security.UserGroupInformation:
>> > PriviledgedActionException as:hduser
>> > cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output
>> directory
>> > KPI.ENTITE already exists
>> > 13/06/17 15:42:17 ERROR tool.ImportTool: Encountered IOException running
>> > import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
>> > directory KPI.ENTITE already exists
>> >         at
>> >
>> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
>> >         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:949)
>> >         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>> >         at java.security.AccessController.doPrivileged(Native Method)
>> >         at javax.security.auth.Subject.doAs(Subject.java:396)
>> >         at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>> >         at
>> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>> >         at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
>> >         at
>> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>> >         at
>> >
>> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:173)
>> >         at
>> > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:151)
>> >         at
>> >
>> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:221)
>> >         at
>> > org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:545)
>> >         at
>> >
>> org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:380)
>> >         at
>> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:22
>> >
>> > The output explains that the output already exists.
>> >
>> > But, Hive command SHOW TABLES give me zero tables !
>> >
>> > Thanks for your help ;-)
>> >
>> >
>> > --
>> > Jérôme
>>
>
>

Mime
View raw message