sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Robson <David.Rob...@software.dell.com>
Subject RE: Sqoop to Oracle transfer rates
Date Fri, 20 Jun 2014 02:58:03 GMT
Hi Brenden,

OraOop can create a partitioned table for you - it will be partitioned by the date of the
load (one new partition for each export), then sub partitioned by mapper (so 24 mappers would
have 24 sub partitions). The easiest way is to use the template table parameter - so here
is the command I used - in this example TST_PRODUCT is my existing non-partitioned table.
TST_PRODUCT_EXP is a table that does not exist and will be created by OraOop - it will have
the same columns as TST_PRODUCT - but will add the columns I mentioned for partitioning. You
can find more information about it in the documentation: https://github.com/QuestSoftwareTCD/OracleSQOOPconnector/blob/master/docs/oraoopuserguide.pdf?raw=true

sqoop export -Dmapred.map.tasks.speculative.execution=false -Doraoop.template.table=tst_product
-Doraoop.partitioned=true --connect jdbc:oracle:thin:@//hostname1:1521/service --username
username --password password --export-dir tst_product --table tst_product_exp --num-mappers

Depending on your use case the partitioning option may or may not work for you - but it is
good in that it reduces the load on the Oracle database by a lot as you can see - and also
massively improves loading time.

If you want to generate the TST_PRODUCT table I use in my testing you can run the OraOop integration
tests - which will generate it with 10,000 rows by default (you can change the parameter to
be any number of rows you like). There is also a command line interface to the integration
tests which we use to create tables for benchmarking purposes. The source code is available
here: https://github.com/QuestSoftwareTCD/OracleSQOOPconnector

Let me know how you go - I do the benchmarking on our Oracle products to make sure they scale
correctly so would be happy to help you out if you have any issues.


From: Brenden Cobb [mailto:Brenden.Cobb@humedica.com]
Sent: Friday, 20 June 2014 12:44 AM
To: user@sqoop.apache.org
Subject: Re: Sqoop to Oracle transfer rates

Thanks David that is very informative. May I ask what partitioning method you utilized?

I've got Oracle licensing covered so that should not be an issue.

From: David Robson <David.Robson@software.dell.com<mailto:David.Robson@software.dell.com>>
Reply-To: "user@sqoop.apache.org<mailto:user@sqoop.apache.org>" <user@sqoop.apache.org<mailto:user@sqoop.apache.org>>
Date: Wednesday, June 18, 2014 7:43 PM
To: "user@sqoop.apache.org<mailto:user@sqoop.apache.org>" <user@sqoop.apache.org<mailto:user@sqoop.apache.org>>
Subject: RE: Sqoop to Oracle transfer rates

Hi Brenden,

I did some benchmarking a while ago on a 62858MB folder. The best I achieved with 24 mappers
was 867 seconds (72.5MB/s). This was on a 4 node Hadoop cluster that was on some old servers
we had so I'm sure with some better hardware you could get a lot faster.

I'll attach some graphs which you might find interesting - basically you can see OraOop reduces
load on the DB a little bit with a few optimizations - but once you use direct path and partition
exchange loading the benefits are massive. Of course this requires you to have the relevant
Oracle licenses so may not be an option.


From: Brenden Cobb [mailto:Brenden.Cobb@humedica.com]
Sent: Thursday, 19 June 2014 6:25 AM
To: user@sqoop.apache.org<mailto:user@sqoop.apache.org>
Subject: Sqoop to Oracle transfer rates

Looking for some benchmarks on Sqoop (or Oraoop) exports.

Appreciate if anyone feels like sharing some metrics. Rough numbers for MB/s would suffice,
perhaps number of mappers used. I'm working specifically with Oracle exports, but other RDBMS
export rates would be enlightening as well.


View raw message