sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: Network resilience of sqoop 1.4.6
Date Mon, 25 Jan 2016 14:51:19 GMT
Hi Nitin,
here is my stab at answering the question:

> 	• Does sqoop perform a clean up of the already imported/exported data?

Import happens to temporary directory, if the job wont’ finish all partially imported data
will get dropped. On export side we have a lot of smaller transactions so you will get partial
export in case of failure. However we have option to export with staging table that is designed
to deal with this partial export issue. I would suggest to take a look into our user guide
[1].

> 	• Does sqoop automatically restart the job in the case of network failure?

There are multiple levels of parallelism and re-tries. If one task fails, Hadoop will re-run
it by default 3 times before killing the whole job itself. We’re not restarting the whole
job as we’re assuming that if 3 re-tries didn’t help, there is no point with retrying
it again.

Jarcec

Links:
1: http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_export_literal


> On Jan 24, 2016, at 10:30 PM, Nitin Kumar <nk94.nitinkumar@gmail.com> wrote:
> 
> 
> I am using apache sqoop 1.4.6 (distributed with HortonWorks HDP 2.3 package) to import
and export data between rdbms systems and hdfs. I have to deploy this in a production environment
and was wondering about the network resilience of sqoop.
> Say I'm done with about 90% of the import/export job and there is a network failure between
the rdbms system and my hadoop cluster. Since sqoop internally executes a map/reduce job for
this I'm guessing the job will fail completely and require a manual restart. In this regard
I have the following questions
> 
> 	• Does sqoop perform a clean up of the already imported/exported data?
> 	• Does sqoop automatically restart the job in the case of network failure?
> 	• If a manual clean up and restart is required, what other technology alongside sqoop
do people generally use to achieve network resilience?
> 	• Is there a different version of sqoop that offers this feature?
> Your answers and suggestions would highly appreciated.
> 
> Thanks!
> 


Mime
View raw message