sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Buntu Dev <buntu...@gmail.com>
Subject Re: sqoop export -- mysql upsert performance
Date Sat, 28 Jun 2014 05:05:35 GMT
Thanks Gwen for the information.


On Thu, Jun 26, 2014 at 7:22 PM, Gwen Shapira <gshapira@cloudera.com> wrote:

> If you did not see it yet, check out this patch:
> https://issues.apache.org/jira/browse/SQOOP-1341
>
> It may solve your issue.
>
> If user_id is a primary key, you don't need to add a unique key. Primary
> keys are always unique.
>
> Gwen
>
>
> On Tue, Jun 24, 2014 at 11:22 AM, Buntu Dev <buntudev@gmail.com> wrote:
>
>> Thanks Gwen for the response.
>>
>> Tried with 4 mappers, 100 records.per.statement and 100
>> statements.per.transactions.. got much lower throughput:
>>
>> [ExportJobBase] - Transferred 788.9209 KB in 623.7671 seconds (1.2648
>> KB/sec)
>> [ExportJobBase] - Exported 63787 records.
>>
>> With 1 mapper it was too slow and we had to kill the job.
>>
>> When we tried with 100/100/100.. notice for same number of records the
>> amount of data transferred is 5.9855MB vs 788KB from previous run:
>>
>> [ExportJobBase] - Transferred 5.9855 MB in 47.9844 seconds (127.7323
>> KB/sec)
>> [ExportJobBase] - Exported 63787 records.
>>
>> Do we need to add the unique key on 'user_id' (already has primary key)
>> as the INSERT statement constructed by sqoop seems fine.
>>
>>
>>
>> On Mon, Jun 23, 2014 at 2:54 PM, Gwen Shapira <gshapira@cloudera.com>
>> wrote:
>>
>>> You are using super high number of mappers for very low amounts of data
>>> (50MB or less) and getting very low throughput (less than 1MB/s)
>>>
>>> Can you try same jobs with just 1 mapper? And 4 mappers?
>>>
>>> Gwen
>>>
>>>
>>>
>>> On Mon, Jun 23, 2014 at 2:32 PM, Buntu Dev <buntudev@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are using sqoop (v1.4.4) export for exporting the uniques per
>>>> user_id into the mysql table with 2 integer columns and with 'user_id' as
>>>> the unique key with these options:
>>>>
>>>>  sqoop export \
>>>>             -Dsqoop.export.records.per.statement=1000 \
>>>>             -Dsqoop.export.statements.per.transaction=1 \
>>>>             --connect "jdbc:mysql://host/db" \
>>>>             --username user \
>>>>             --password pwd \
>>>>             --table tbl \
>>>>             --batch \
>>>>             --relaxed-isolation \
>>>>             --update-mode allowinsert \
>>>>             --update-key user_id \
>>>>             --export-dir output/dir/ \
>>>>             --input-fields-terminated-by '\t' \
>>>>             --input-lines-terminated-by '\n' \
>>>>             --num-mappers=200
>>>>
>>>> Are the options such as batching, records/statement or statements per
>>>> transaction applicable in case of the MySQL upserts?
>>>>
>>>> Also, we are noticing that for smaller jobs the throughput of the
>>>> export job is higher compared to the larger jobs:
>>>>
>>>> large job:
>>>> [ExportJobBase] - Transferred 37.3672 MB in 838.2908 seconds (45.6452
>>>> KB/sec)
>>>> [ExportJobBase] - Exported 3025677 records.
>>>>
>>>> small job:
>>>> [ExportJobBase] - Transferred 12.0951 MB in 40.9846 seconds (302.1965
>>>> KB/sec)
>>>> [ExportJobBase] - Exported 88042 records.
>>>>
>>>> and bumping up the mappers to 400 has similar behavior as well:
>>>>
>>>> large job:
>>>> [ExportJobBase] - Transferred 49.6578 MB in 638.6147 seconds (79.6249
>>>> KB/sec)
>>>> [ExportJobBase] - Exported 3243995 records.
>>>>
>>>> small job:
>>>> [ExportJobBase] - Transferred 24.4653 MB in 59.1785 seconds (423.3366
>>>> KB/sec)
>>>> [ExportJobBase] - Exported 139181 records.
>>>>
>>>> Attempting to remove batch option or increasing the number of
>>>> statements per transaction causes lock wait timeout exceeded exceptions.
>>>>
>>>> Please let me know if there is anything obvious we might be missing.
>>>>
>>>> Thanks!
>>>>
>>>
>>>
>>
>

Mime
View raw message