sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erzsebet Szilagyi <liz.szila...@cloudera.com>
Subject Re: Does Sqoop support small queries?
Date Thu, 30 Jun 2016 22:50:47 GMT
Hi Wei,
Markus (in CC) offered the following explanation:

"
The Sqoop1 default is 4 map tasks.  When working with customers I usually
start with 1 and double the number of map tasks (e.g. 1, 2, 4, 8) until
finding a performance sweet spot while keeping in mind the potential rdbms
impact.

Estimating the real rdbms impact is often challenging for some of the
following reasons:
1. DBAs are often not present
2. Jobs are often reviewed in isolation (excluding other simultaneous Sqoop
or non-sqoop workloads)
3. Tests are often performed against smaller data volumes and/or virtual
resources than what will be in production (includes rdbms, network and had
pop cluster)
4. There is not a uniform way to monitor/analyze impact across rdbms
vendors.
4.1. I have not really tried to review Sqoop console debug from a dB impact
context, perhaps it could be used.
5. Once deployed production job volumes often change

Thanks, Markus
"

On Wed, Jun 29, 2016 at 7:35 PM, Wei Yan <ywskycn@gmail.com> wrote:

> Hi,
>
> Would like to check whether Sqoop supports this type of ingestion:
> consider we have records with range [1,12], and we have 3 mappers. So in
> default, the 3 mappers will be assigned [1,4], [5, 8], [9, 12].
>
> Not sure whether we can split the range to smaller one, like, [1], [2],
> [3], ..., [12]. But still using 3 mappers instead of 12 mappers. We want
> this feature because: (1) if configured smaller mapper number, each mapper
> will be assigned a larger range and take much longer time to finish, and
> the infra may kill long running query; (2) But if we configured a larger
> mapper number, each mapper has a smaller range, but meanwhile we generates
> lots of network traffic to the database, which will also be bad. One good
> way we want is: still 12 ranges, but 3 mappers, and at most 3 concurrent
> connections at most.
>
> Appreciate any help here.
>
> -Wei
>



-- 
Erzsebet Szilagyi
Software Engineer
[image: www.cloudera.com] <http://www.cloudera.com>

Mime
View raw message