sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brenden Cobb <Brenden.C...@humedica.com>
Subject Re: Sqoop to Oracle transfer rates
Date Mon, 23 Jun 2014 15:39:45 GMT
David – Thanks so much for your detailed responses. I will kick the tires a using your information
and see what happens.

-Brenden

From: David Robson <David.Robson@software.dell.com<mailto:David.Robson@software.dell.com>>
Reply-To: "user@sqoop.apache.org<mailto:user@sqoop.apache.org>" <user@sqoop.apache.org<mailto:user@sqoop.apache.org>>
Date: Thursday, June 19, 2014 10:58 PM
To: "user@sqoop.apache.org<mailto:user@sqoop.apache.org>" <user@sqoop.apache.org<mailto:user@sqoop.apache.org>>
Subject: RE: Sqoop to Oracle transfer rates

Hi Brenden,

OraOop can create a partitioned table for you – it will be partitioned by the date of the
load (one new partition for each export), then sub partitioned by mapper (so 24 mappers would
have 24 sub partitions). The easiest way is to use the template table parameter – so here
is the command I used – in this example TST_PRODUCT is my existing non-partitioned table.
TST_PRODUCT_EXP is a table that does not exist and will be created by OraOop – it will have
the same columns as TST_PRODUCT – but will add the columns I mentioned for partitioning.
You can find more information about it in the documentation: https://github.com/QuestSoftwareTCD/OracleSQOOPconnector/blob/master/docs/oraoopuserguide.pdf?raw=true

sqoop export -Dmapred.map.tasks.speculative.execution=false -Doraoop.template.table=tst_product
-Doraoop.partitioned=true --connect jdbc:oracle:thin:@//hostname1:1521/service --username
username --password password --export-dir tst_product --table tst_product_exp --num-mappers
24

Depending on your use case the partitioning option may or may not work for you – but it
is good in that it reduces the load on the Oracle database by a lot as you can see – and
also massively improves loading time.

If you want to generate the TST_PRODUCT table I use in my testing you can run the OraOop integration
tests – which will generate it with 10,000 rows by default (you can change the parameter
to be any number of rows you like). There is also a command line interface to the integration
tests which we use to create tables for benchmarking purposes. The source code is available
here: https://github.com/QuestSoftwareTCD/OracleSQOOPconnector

Let me know how you go – I do the benchmarking on our Oracle products to make sure they
scale correctly so would be happy to help you out if you have any issues.

David

From: Brenden Cobb [mailto:Brenden.Cobb@humedica.com]
Sent: Friday, 20 June 2014 12:44 AM
To: user@sqoop.apache.org<mailto:user@sqoop.apache.org>
Subject: Re: Sqoop to Oracle transfer rates

Thanks David that is very informative. May I ask what partitioning method you utilized?

I've got Oracle licensing covered so that should not be an issue.

From: David Robson <David.Robson@software.dell.com<mailto:David.Robson@software.dell.com>>
Reply-To: "user@sqoop.apache.org<mailto:user@sqoop.apache.org>" <user@sqoop.apache.org<mailto:user@sqoop.apache.org>>
Date: Wednesday, June 18, 2014 7:43 PM
To: "user@sqoop.apache.org<mailto:user@sqoop.apache.org>" <user@sqoop.apache.org<mailto:user@sqoop.apache.org>>
Subject: RE: Sqoop to Oracle transfer rates

Hi Brenden,

I did some benchmarking a while ago on a 62858MB folder. The best I achieved with 24 mappers
was 867 seconds (72.5MB/s). This was on a 4 node Hadoop cluster that was on some old servers
we had so I’m sure with some better hardware you could get a lot faster.

I’ll attach some graphs which you might find interesting – basically you can see OraOop
reduces load on the DB a little bit with a few optimizations – but once you use direct path
and partition exchange loading the benefits are massive. Of course this requires you to have
the relevant Oracle licenses so may not be an option.

David

From: Brenden Cobb [mailto:Brenden.Cobb@humedica.com]
Sent: Thursday, 19 June 2014 6:25 AM
To: user@sqoop.apache.org<mailto:user@sqoop.apache.org>
Subject: Sqoop to Oracle transfer rates

Looking for some benchmarks on Sqoop (or Oraoop) exports.

Appreciate if anyone feels like sharing some metrics. Rough numbers for MB/s would suffice,
perhaps number of mappers used. I'm working specifically with Oracle exports, but other RDBMS
export rates would be enlightening as well.

Thanks

Mime
View raw message