sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Pradeep <codeva...@gmail.com>
Subject Incremental import fail to hive from PostgreSQL
Date Sun, 15 Apr 2012 23:19:54 GMT
Hi All

I want to import the updated data from my source (PostgreSQL) to hive based
on a column (lastmodifiedtime) in postgreSQL

*The command I am using*

/app/sqoop/bin/sqoop import --hive-table users --connect
jdbc:postgresql:/<server_url>/<database> --table users --username XXXXXXX
--password YYYYYY --hive-home /app/hive --hive-import --incremental
lastmodified --check-column lastmodifiedtime

With the above command, I am getting the below error

12/04/13 16:31:21 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-root/compile/11ce8600a5656ed49e631a260c387692/users.jar
12/04/13 16:31:21 INFO tool.ImportTool: Incremental import based on column
"lastmodifiedtime"
12/04/13 16:31:21 INFO tool.ImportTool: Upper bound value: '2012-04-13
16:31:21.865429'
12/04/13 16:31:21 WARN manager.PostgresqlManager: It looks like you are
importing from postgresql.
12/04/13 16:31:21 WARN manager.PostgresqlManager: This transfer can be
faster! Use the --direct
12/04/13 16:31:21 WARN manager.PostgresqlManager: option to exercise a
postgresql-specific fast path.
12/04/13 16:31:21 INFO mapreduce.ImportJobBase: Beginning import of users
12/04/13 16:31:23 ERROR tool.ImportTool: Encountered IOException running
import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
directory users already exists
        at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
        at
org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
        at
org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
        at
org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
        at
org.apache.sqoop.manager.PostgresqlManager.importTable(PostgresqlManager.java:102)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
        at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)

According to the above stack trace, sqoop it identify the updated data from
postgreSQL, but it says output directory already exists. Could someone
please help me to correct this issue.

*I am using *

Hadoop - 0.20.2
Hive - 0.8.1
Sqoop - 1.4.1-incubating


Thanks.

Mime
View raw message