sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krumm, Gregory" <Gregory_Kr...@McAfee.com>
Subject Poor Sqoop to SQL Server Performance
Date Wed, 17 Dec 2014 19:25:13 GMT
Hello,

I am working on utilizing Sqoop to export data from HDFS to SQL Server 2012 but cannot seem
to get very good performance.

Below are some details about what I’m trying to do:

  *   Export 100 million, 3-column records per day (totaling about 4GB)
  *   Records are stored in CSV format in part files of about 20MB
  *   Utilize our current cluster running Sqoop 1.4.2 and Hadoop 0.20
  *   Eventually utilize our new cluster running Sqoop 1.4.5 and Hadoop 2.4 (Currently doesn’t
have YARN, so we have had to run HADOOP_MAPREDUCE_MODE as Classic)

While running Sqoop 1.4.2 using the sqljdbc4.jar, Sqoop only transfers about 1 million records
per hour (50MB).
Running Sqoop 1.4.5 using sqljdbc41.jar, Sqoop only transfers about 2 million records per
hour (100MB).

Below is the Sqoop command I’m executing for this job.  Based on many executions, the records.per.statement,
and statements.per.transaction, never seem to change performance.

sqoop export -D sqoop.export.records.per.statement=100 -D sqoop.export.statements.per.transaction=100
--connect 'jdbc:sqlserver://[SERVER_NAME];username=[USERNAME];password=[PASSWORD];database=[DB
NAME]' —table=[TABLE_NAME] --export-dir /[DATA DIR] --input-fields-terminated-by ',' --input-lines-terminated-by
'\n' —verbose

Adding the batch or direct flags have only ever made performance worse so at the moment, they
are not included.



When running the above statement, we have recorded network captures of the Sqoop traffic to
SQL server, as well as recorded SQL Server activity via the Profiler.  Both of these captures
show only single records being inserted at a time into SQL Server.

Our SQL Server activity ends up looking like this:

RPC:Completed   exec sp_execute 116097,N'ffdbb68a84686edd029bde9766192b15',N'north_america',37
 Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdbdc2bb770a13a3b4eb86836341ab3',N'south_america',1
  Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdbfefc63e77861d708f3f5d82086fa',N'eastern_asia',1
   Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdc59a3a42aa2ef0fb60fd3f604a96c',N'southeast_asia',1
 Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdcbd667479a412923d03c6f305eb11',N'eastern_asia',12
  Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdd1cc279a54921c97cef463c6773fb',N'western_europe',37
Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdd437610d1dc1cb8aa77f04bf21581',N'south_america',10
 Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdd8204335231b2138dab577bc8d906',N'northern_europe',1
Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffde22f7f6a9d2913336f44d9db269e4',N'north_america',1
  Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffde2e3ae2a90ad5796e1d6f5fb3e1f2',N'eastern_asia',13
  Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffde56a01846c01225145ed043b2f43b',N'north_america',2
  Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffde90f6f14a00533327cc39562f086f',N'south_america',1
  Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdea65ab1b1091150924b5d89d441c2',N'eastern_asia',1
   Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdeef3353f992fc74d5cfc54533aa49',N'north_america',1
  Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdefae31deecb4ff80e76bea129e4dd',N'western_europe',1
 Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdf11270af0e9ee7841355e966d99b7',N'south_asia',1
     Microsoft JDBC Driver for SQL Server
RPC:Completed   exec sp_execute 116097,N'ffdf71938015bb64e371aaedb9700ab1',N'eastern_europe',1
 Microsoft JDBC Driver for SQL Server



Now, if I export the same data to MySQL using Sqoop, it exports data at closer to 15 million
records an hour (roughly 7 times faster).

Is this a known limitation to Sqoop or the Microsoft JDBC driver, or is there something I
can change on the Sqoop or SQL server side to speed up this transfer?

Thanks,

Greg
Mime
View raw message