Markus (in CC) offered the following explanation:
The Sqoop1 default is 4 map
tasks. When working with customers I usually start with 1 and double
the number of map tasks (e.g. 1, 2, 4, 8) until finding a performance
sweet spot while keeping in mind the potential rdbms impact.
Estimating the real rdbms impact is often challenging for some of the following reasons:
1. DBAs are often not present
2. Jobs are often reviewed in isolation (excluding other simultaneous Sqoop or non-sqoop workloads)
Tests are often performed against smaller data volumes and/or virtual
resources than what will be in production (includes rdbms, network and
had pop cluster)
4. There is not a uniform way to monitor/analyze impact across rdbms vendors.
4.1. I have not really tried to review Sqoop console debug from a dB impact context, perhaps it could be used.
5. Once deployed production job volumes often change