My use case is to do a full import periodically. I looked at the incremental imports and it seems that it could be only used in combination with the --where option if I wanted to download specific chunk sizes.
parameter --num-mappers will limit number of parallel threads exporting your data. Which should decrease load on your server. However you're right that by limiting --num-mappers to small number you will increase amount of data that will be transferred in each mapper.
Another way of limiting export data is parameter --where (for table import), that could be basically anything that will be passed into the WHERE clause of generated SQL statement. You can limit export data with this --where and thus form your batch almost arbitrary. For example if your table have autoincrement integer based primary key, you can very easily specify range of keys that you want to export in each call.
I'm not sure what your use case is, but it appears to me that you're exporting your tables on periodical basis, each time with full import. If that is right, you might consider sqoop "incremental import" support:
On Thu, May 24, 2012 at 12:04:22AM -0700, Brian Tran wrote:
> Hi Sqoop gurus,
> I currently use Sqoop to import from MySQL into HDFS.
> Some of the tables that I import have become significantly larger to the
> point that a full dump significantly slows down the host.
> I would like to split the imports into smaller chunks, but limit the number
> of chunks I download in parallel to avoid significant load on the server.
> Is there anything in Sqoop that provides this functionality?
> The closest thing I could find in the Sqoop user guide was the
> --num-mappers option, but using it to download in smaller chunks would
> increase the server load as all the chunks are downloaded in parallel.