sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: Sqoop Import parallel sessions - Question
Date Thu, 14 Aug 2014 16:24:09 GMT
Sqoop needs to write to a directory that doesn't exist yet. Since both
your jobs try to write a single directory, one will complain that the
directory exists.

You can use --warehouse-dir  or --target-dir parameters to make sure
each job writes to its own directory.
Or, you can use --partition-key and --partition value parameters to
import the data into separate Hive partitions (makes sense from table
design perspective too)

On Thu, Aug 14, 2014 at 9:12 AM, Sethuramaswamy, Suresh
<suresh.sethuramaswamy@credit-suisse.com> wrote:
> Sure.
>
> This is my command. When I run 2 commands in parallel , I get the exception as mentioned
below.
>
> sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table <Table_name>
  --where "date between '01-JAN-2013' and '30-JAN-2013'" -m 1 --hive-import  --hive-table
<hive tablename>  --compression-codec org.apache.hadoop.io.compress.SnappyCodec --null-string
'\\N' --null-non-string '\\N' --hive-drop-import-delims;
>
> ...
> ...
>
> ..
>
> sqoop import --connect jdbc:oracle:thin:@<<ORACLE DB DETAILS>>  --table <Table_name>
  --where "date between '01-DEC-2013' and '31-DEC-2013'" -m 1 --hive-import  --hive-table
<hive tablename>  --compression-codec org.apache.hadoop.io.compress.SnappyCodec --null-string
'\\N' --null-non-string '\\N' --hive-drop-import-delims;
>
>
>
> Exception:
>
>
> 14/08/14 12:04:57 ERROR tool.ImportTool: Encountered IOException running import job:
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory <SCHEMA>.<TABLENAME>
already exists
>         at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:987)
>         at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:582)
>         at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:612)
>         at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
>         at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
>         at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:247)
>         at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:614)
>         at org.apache.sqoop.manager.OracleManager.importTable(OracleManager.java:436)
>         at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
>         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:506)
>         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:222)
>         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:231)
>         at org.apache.sqoop.Sqoop.main(Sqoop.java:240)
>
>
> -----Original Message-----
> From: Jarek Jarcec Cecho [mailto:jarcec@gmail.com] On Behalf Of Jarek Jarcec Cecho
> Sent: Thursday, August 14, 2014 11:41 AM
> To: user@sqoop.apache.org
> Subject: Re: Sqoop Import parallel sessions - Question
>
> It would be helpful if you could share your entire Sqoop commands and the exact exception
with it's stack trace.
>
> Jarcec
>
> On Aug 14, 2014, at 7:57 AM, Sethuramaswamy, Suresh <suresh.sethuramaswamy@credit-suisse.com>
wrote:
>
>> Team,
>>
>> We had to initiate Sqoop import for a month old records in a session, similarly I
need to initiate 12 such statements in parallel in order to read 1 year worth of data, while
I do this,
>>
>> I keep getting the error <SCHEMA>.<TABLENAME> folder already exists.
 This is because of all these sessions being initiated with same uid and the mapred temporary
hdfs folder under the user's home directory until it completes.
>>
>> Is there a better option for me to accomplish .?
>>
>>
>> Thanks
>> Suresh Sethuramaswamy
>>
>>
>>
>> ==============================================================================
>> Please access the attached hyperlink for an important electronic communications disclaimer:
>> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
>> ==============================================================================
>
>
>
> ===============================================================================
> Please access the attached hyperlink for an important electronic communications disclaimer:
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===============================================================================
>

Mime
View raw message