sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Kumar <nk94.nitinku...@gmail.com>
Subject Re: Network resilience of sqoop 1.4.6
Date Fri, 29 Jan 2016 07:33:49 GMT
Thanks Jarcec for clearing that up!

Regards,
Nitin

On Mon, Jan 25, 2016 at 8:21 PM, Jarek Jarcec Cecho <jarcec@apache.org>
wrote:

> Hi Nitin,
> here is my stab at answering the question:
>
> >       • Does sqoop perform a clean up of the already imported/exported
> data?
>
> Import happens to temporary directory, if the job wont’ finish all
> partially imported data will get dropped. On export side we have a lot of
> smaller transactions so you will get partial export in case of failure.
> However we have option to export with staging table that is designed to
> deal with this partial export issue. I would suggest to take a look into
> our user guide [1].
>
> >       • Does sqoop automatically restart the job in the case of network
> failure?
>
> There are multiple levels of parallelism and re-tries. If one task fails,
> Hadoop will re-run it by default 3 times before killing the whole job
> itself. We’re not restarting the whole job as we’re assuming that if 3
> re-tries didn’t help, there is no point with retrying it again.
>
> Jarcec
>
> Links:
> 1:
> http://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_literal_sqoop_export_literal
>
>
> > On Jan 24, 2016, at 10:30 PM, Nitin Kumar <nk94.nitinkumar@gmail.com>
> wrote:
> >
> >
> > I am using apache sqoop 1.4.6 (distributed with HortonWorks HDP 2.3
> package) to import and export data between rdbms systems and hdfs. I have
> to deploy this in a production environment and was wondering about the
> network resilience of sqoop.
> > Say I'm done with about 90% of the import/export job and there is a
> network failure between the rdbms system and my hadoop cluster. Since sqoop
> internally executes a map/reduce job for this I'm guessing the job will
> fail completely and require a manual restart. In this regard I have the
> following questions
> >
> >       • Does sqoop perform a clean up of the already imported/exported
> data?
> >       • Does sqoop automatically restart the job in the case of network
> failure?
> >       • If a manual clean up and restart is required, what other
> technology alongside sqoop do people generally use to achieve network
> resilience?
> >       • Is there a different version of sqoop that offers this feature?
> > Your answers and suggestions would highly appreciated.
> >
> > Thanks!
> >
>
>

Mime
View raw message