sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishnan K <kkrishna...@gmail.com>
Subject Re: Incremental import fail to hive from PostgreSQL
Date Sun, 15 Apr 2012 23:48:38 GMT
Hi Roshan,

If you have run the sqoop command once and even if it fails, it creates an
output directory in HDFS.
You can delete this folder (users) and then try running this sqoop command
once again -

*hadoop dfs -rmr users* and execute this command.

Sqoop first imports the data from PostGRE into HDFS and
then moves into the hive default directory
(/user/hive/warehouse/<tablename>)

For sqoop to copy this first into HDFS, you must ensure that it does not
already exist.

-Krishnan

On Mon, Apr 16, 2012 at 4:49 AM, Roshan Pradeep <codevally@gmail.com> wrote:

> Hi All
>
> I want to import the updated data from my source (PostgreSQL) to hive
> based on a column (lastmodifiedtime) in postgreSQL
>
> *The command I am using*
>
> /app/sqoop/bin/sqoop import --hive-table users --connect
> jdbc:postgresql:/<server_url>/<database> --table users --username XXXXXXX
> --password YYYYYY --hive-home /app/hive --hive-import --incremental
> lastmodified --check-column lastmodifiedtime
>
> With the above command, I am getting the below error
>
> 12/04/13 16:31:21 INFO orm.CompilationManager: Writing jar file:
> /tmp/sqoop-root/compile/11ce8600a5656ed49e631a260c387692/users.jar
> 12/04/13 16:31:21 INFO tool.ImportTool: Incremental import based on column
> "lastmodifiedtime"
> 12/04/13 16:31:21 INFO tool.ImportTool: Upper bound value: '2012-04-13
> 16:31:21.865429'
> 12/04/13 16:31:21 WARN manager.PostgresqlManager: It looks like you are
> importing from postgresql.
> 12/04/13 16:31:21 WARN manager.PostgresqlManager: This transfer can be
> faster! Use the --direct
> 12/04/13 16:31:21 WARN manager.PostgresqlManager: option to exercise a
> postgresql-specific fast path.
> 12/04/13 16:31:21 INFO mapreduce.ImportJobBase: Beginning import of users
> 12/04/13 16:31:23 ERROR tool.ImportTool: Encountered IOException running
> import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output
> directory users already exists
>         at
> org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:123)
>         at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:770)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
>         at
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:141)
>         at
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:201)
>         at
> org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:413)
>         at
> org.apache.sqoop.manager.PostgresqlManager.importTable(PostgresqlManager.java:102)
>         at
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:380)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:453)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
>         at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
>
> According to the above stack trace, sqoop it identify the updated data
> from postgreSQL, but it says output directory already exists. Could someone
> please help me to correct this issue.
>
> *I am using *
>
> Hadoop - 0.20.2
> Hive - 0.8.1
> Sqoop - 1.4.1-incubating
>
>
> Thanks.
>

Mime
View raw message