My use case is to do a full import periodically. I looked at the incremental imports and it seems that it could be only used in combination with the --where option if I wanted to download specific chunk sizes. I ended up writing a script that sets --boundary-query to select rows within a given chunk size range. ie) "select 1,1000" to grab the first 1000 rows and then the same sqoop job except setting --boundary-query to "select 1001,2000" for the next chunk. Thanks everybody for the ideas that helped me reach this solution. On Thu, May 24, 2012 at 12:19 AM, Jarek Jarcec Cecho wrote: > Hi Brian, > parameter --num-mappers will limit number of parallel threads exporting > your data. Which should decrease load on your server. However you're right > that by limiting --num-mappers to small number you will increase amount of > data that will be transferred in each mapper. > > Another way of limiting export data is parameter --where (for table > import), that could be basically anything that will be passed into the > WHERE clause of generated SQL statement. You can limit export data with > this --where and thus form your batch almost arbitrary. For example if your > table have autoincrement integer based primary key, you can very easily > specify range of keys that you want to export in each call. > > I'm not sure what your use case is, but it appears to me that you're > exporting your tables on periodical basis, each time with full import. If > that is right, you might consider sqoop "incremental import" support: > > > http://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html#_incremental_imports > > Jarcec > > On Thu, May 24, 2012 at 12:04:22AM -0700, Brian Tran wrote: > > Hi Sqoop gurus, > > > > I currently use Sqoop to import from MySQL into HDFS. > > > > Some of the tables that I import have become significantly larger to the > > point that a full dump significantly slows down the host. > > > > I would like to split the imports into smaller chunks, but limit the > number > > of chunks I download in parallel to avoid significant load on the server. > > > > Is there anything in Sqoop that provides this functionality? > > > > The closest thing I could find in the Sqoop user guide was the > > --num-mappers option, but using it to download in smaller chunks would > > increase the server load as all the chunks are downloaded in parallel. > > > > Thanks! > > > > Brian >