sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Krumm <g...@gregkrumm.com>
Subject Re: Poor Sqoop to SQL Server Performance
Date Thu, 15 Jan 2015 17:31:10 GMT
Does anyone have any insight into my previous post?

Greg


On Wed, Dec 17, 2014 at 11:25 AM, Krumm, Gregory <Gregory_Krumm@mcafee.com>
wrote:

>  Hello,
>
>  I am working on utilizing Sqoop to export data from HDFS to SQL Server
> 2012 but cannot seem to get very good performance.
>
>  Below are some details about what I’m trying to do:
>
>    - Export 100 million, 3-column records per day (totaling about 4GB)
>    - Records are stored in CSV format in part files of about 20MB
>    - Utilize our current cluster running Sqoop 1.4.2 and Hadoop 0.20
>    - Eventually utilize our new cluster running Sqoop 1.4.5 and Hadoop
>    2.4 (Currently doesn’t have YARN, so we have had to run
>    HADOOP_MAPREDUCE_MODE as Classic)
>
> While running Sqoop 1.4.2 using the sqljdbc4.jar, Sqoop only transfers
> about 1 million records per hour (50MB).
> Running Sqoop 1.4.5 using sqljdbc41.jar, Sqoop only transfers about 2
> million records per hour (100MB).
>
>  Below is the Sqoop command I’m executing for this job.  Based on many
> executions, the records.per.statement, and statements.per.transaction,
> never seem to change performance.
>
>  *sqoop export -D sqoop.export.records.per.statement=100 -D
> sqoop.export.statements.per.transaction=100 --connect
> 'jdbc:sqlserver://[SERVER_NAME];username=[USERNAME];password=[PASSWORD];database=[DB
> NAME]' —table=[TABLE_NAME] --export-dir /[DATA DIR]
> --input-fields-terminated-by ',' --input-lines-terminated-by '\n' —verbose*
>
>  Adding the batch or direct flags have only ever made performance worse
> so at the moment, they are not included.
>
>
>
>  When running the above statement, we have recorded network captures of
> the Sqoop traffic to SQL server, as well as recorded SQL Server activity
> via the Profiler.  Both of these captures show only single records being
> inserted at a time into SQL Server.
>
>  Our SQL Server activity ends up looking like this:
>
>     RPC:Completed exec sp_execute
> 116097,N'ffdbb68a84686edd029bde9766192b15',N'north_america',37 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdbdc2bb770a13a3b4eb86836341ab3',N'south_america',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdbfefc63e77861d708f3f5d82086fa',N'eastern_asia',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdc59a3a42aa2ef0fb60fd3f604a96c',N'southeast_asia',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdcbd667479a412923d03c6f305eb11',N'eastern_asia',12 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdd1cc279a54921c97cef463c6773fb',N'western_europe',37 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdd437610d1dc1cb8aa77f04bf21581',N'south_america',10 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdd8204335231b2138dab577bc8d906',N'northern_europe',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffde22f7f6a9d2913336f44d9db269e4',N'north_america',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffde2e3ae2a90ad5796e1d6f5fb3e1f2',N'eastern_asia',13 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffde56a01846c01225145ed043b2f43b',N'north_america',2 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffde90f6f14a00533327cc39562f086f',N'south_america',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdea65ab1b1091150924b5d89d441c2',N'eastern_asia',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdeef3353f992fc74d5cfc54533aa49',N'north_america',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdefae31deecb4ff80e76bea129e4dd',N'western_europe',1 Microsoft
> JDBC Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdf11270af0e9ee7841355e966d99b7',N'south_asia',1 Microsoft JDBC
> Driver for SQL Server  RPC:Completed exec sp_execute
> 116097,N'ffdf71938015bb64e371aaedb9700ab1',N'eastern_europe',1 Microsoft
> JDBC Driver for SQL Server
>
>
>  Now, if I export the same data to MySQL using Sqoop, it exports data at
> closer to 15 million records an hour (roughly 7 times faster).
>
>  Is this a known limitation to Sqoop or the Microsoft JDBC driver, or is
> there something I can change on the Sqoop or SQL server side to speed up
> this transfer?
>
>  Thanks,
>
>  Greg
>

Mime
View raw message