sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwen Shapira <gshap...@cloudera.com>
Subject Re: sqoop export -- mysql upsert performance
Date Mon, 23 Jun 2014 21:54:37 GMT
You are using super high number of mappers for very low amounts of data
(50MB or less) and getting very low throughput (less than 1MB/s)

Can you try same jobs with just 1 mapper? And 4 mappers?

Gwen



On Mon, Jun 23, 2014 at 2:32 PM, Buntu Dev <buntudev@gmail.com> wrote:

> Hi,
>
> We are using sqoop (v1.4.4) export for exporting the uniques per user_id
> into the mysql table with 2 integer columns and with 'user_id' as the
> unique key with these options:
>
>  sqoop export \
>             -Dsqoop.export.records.per.statement=1000 \
>             -Dsqoop.export.statements.per.transaction=1 \
>             --connect "jdbc:mysql://host/db" \
>             --username user \
>             --password pwd \
>             --table tbl \
>             --batch \
>             --relaxed-isolation \
>             --update-mode allowinsert \
>             --update-key user_id \
>             --export-dir output/dir/ \
>             --input-fields-terminated-by '\t' \
>             --input-lines-terminated-by '\n' \
>             --num-mappers=200
>
> Are the options such as batching, records/statement or statements per
> transaction applicable in case of the MySQL upserts?
>
> Also, we are noticing that for smaller jobs the throughput of the export
> job is higher compared to the larger jobs:
>
> large job:
> [ExportJobBase] - Transferred 37.3672 MB in 838.2908 seconds (45.6452
> KB/sec)
> [ExportJobBase] - Exported 3025677 records.
>
> small job:
> [ExportJobBase] - Transferred 12.0951 MB in 40.9846 seconds (302.1965
> KB/sec)
> [ExportJobBase] - Exported 88042 records.
>
> and bumping up the mappers to 400 has similar behavior as well:
>
> large job:
> [ExportJobBase] - Transferred 49.6578 MB in 638.6147 seconds (79.6249
> KB/sec)
> [ExportJobBase] - Exported 3243995 records.
>
> small job:
> [ExportJobBase] - Transferred 24.4653 MB in 59.1785 seconds (423.3366
> KB/sec)
> [ExportJobBase] - Exported 139181 records.
>
> Attempting to remove batch option or increasing the number of statements
> per transaction causes lock wait timeout exceeded exceptions.
>
> Please let me know if there is anything obvious we might be missing.
>
> Thanks!
>

Mime
View raw message