sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Tran <br...@box.com>
Subject Re: Sqoop downloads split into chunks
Date Thu, 24 May 2012 22:15:32 GMT
My use case is to do a full import periodically. I looked at the
incremental imports and it seems that it could be only used in combination
with the --where option if I wanted to download specific chunk sizes.

I ended up writing a script that sets --boundary-query to select rows
within a given chunk size range. ie) "select 1,1000" to grab the first 1000
rows and then the same sqoop job except setting --boundary-query to "select
1001,2000" for the next chunk.

Thanks everybody for the ideas that helped me reach this solution.

On Thu, May 24, 2012 at 12:19 AM, Jarek Jarcec Cecho <jarcec@apache.org>wrote:

> Hi Brian,
> parameter --num-mappers will limit number of parallel threads exporting
> your data. Which should decrease load on your server. However you're right
> that by limiting --num-mappers to small number you will increase amount of
> data that will be transferred in each mapper.
>
> Another way of limiting export data is parameter --where (for table
> import), that could be basically anything that will be passed into the
> WHERE clause of generated SQL statement. You can limit export data with
> this --where and thus form your batch almost arbitrary. For example if your
> table have autoincrement integer based primary key, you can very easily
> specify range of keys that you want to export in each call.
>
> I'm not sure what your use case is, but it appears to me that you're
> exporting your tables on periodical basis, each time with full import. If
> that is right, you might consider sqoop "incremental import" support:
>
>
> http://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html#_incremental_imports
>
> Jarcec
>
> On Thu, May 24, 2012 at 12:04:22AM -0700, Brian Tran wrote:
> > Hi Sqoop gurus,
> >
> > I currently use Sqoop to import from MySQL into HDFS.
> >
> > Some of the tables that I import have become significantly larger to the
> > point that a full dump significantly slows down the host.
> >
> > I would like to split the imports into smaller chunks, but limit the
> number
> > of chunks I download in parallel to avoid significant load on the server.
> >
> > Is there anything in Sqoop that provides this functionality?
> >
> > The closest thing I could find in the Sqoop user guide was the
> > --num-mappers option, but using it to download in smaller chunks would
> > increase the server load as all the chunks are downloaded in parallel.
> >
> > Thanks!
> >
> > Brian
>

Mime
View raw message