sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarek Jarcec Cecho <jar...@apache.org>
Subject Re: Sqoop downloads split into chunks
Date Thu, 24 May 2012 07:19:41 GMT
Hi Brian,
parameter --num-mappers will limit number of parallel threads exporting your data. Which should
decrease load on your server. However you're right that by limiting --num-mappers to small
number you will increase amount of data that will be transferred in each mapper.

Another way of limiting export data is parameter --where (for table import), that could be
basically anything that will be passed into the WHERE clause of generated SQL statement. You
can limit export data with this --where and thus form your batch almost arbitrary. For example
if your table have autoincrement integer based primary key, you can very easily specify range
of keys that you want to export in each call.

I'm not sure what your use case is, but it appears to me that you're exporting your tables
on periodical basis, each time with full import. If that is right, you might consider sqoop
"incremental import" support:

http://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html#_incremental_imports

Jarcec

On Thu, May 24, 2012 at 12:04:22AM -0700, Brian Tran wrote:
> Hi Sqoop gurus,
> 
> I currently use Sqoop to import from MySQL into HDFS.
> 
> Some of the tables that I import have become significantly larger to the
> point that a full dump significantly slows down the host.
> 
> I would like to split the imports into smaller chunks, but limit the number
> of chunks I download in parallel to avoid significant load on the server.
> 
> Is there anything in Sqoop that provides this functionality?
> 
> The closest thing I could find in the Sqoop user guide was the
> --num-mappers option, but using it to download in smaller chunks would
> increase the server load as all the chunks are downloaded in parallel.
> 
> Thanks!
> 
> Brian

Mime
View raw message