sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sai chaitanya tirumerla <tirume...@gmail.com>
Subject Sqoop Export to sql server bugs
Date Thu, 04 Dec 2014 17:59:08 GMT
Hi,

We have found several bugs while doing sqoop export from s3 to sql server ,
not sure if there were any patches released for these.

Sqoop command used to export the data :

sqoop export \
--driver  com.microsoft.sqlserver.jdbc.SQLServerDriver \
--connect “<connect string>" \
--username “<username>" \
--password “<password>" \
--update-key audience_member_id \
--table “schema_name.tablename" \
--columns
audience_member_id,addr_1,addr_2,addr_3,email,email_alt,first_name,last_name,middle_name,phone_alt,phone_day,phone_fax
\
--export-dir “<s3 path>" \
--escaped-by '\\' --enclosed-by '\"' --fields-terminated-by ‘,'


Here are the datatypes of the both target table and also source table (
hive ) :



*APPLICATION_LOGICAL_NAME*
*SQLSRV_DATABASE_NAME*
*SCHEMA_NAME*
*TABLE_NAME*
APR
BRK
abc
audience


*COLUMN_NAME*
*COLUMN_POSITION*
*DATA_TYPE_HIVE*
*DATA_TYPE_TARGET*
*DATA_MAX_SIZE*

*NOTES*
ADDR_1
1
string
nvarchar
75

ADDR_2
2
string
nvarchar
75

ADDR_3
3
string
nvarchar
75


EMAIL
4
string
nvarchar
255


EMAIL_ALT
5
string
nvarchar
255


FIRST_NAME
6
string
nvarchar
75


LAST_NAME
7
string
nvarchar
75

MIDDLE_NAME
8
string
nvarchar
75


PHONE_ALT
9
string
nvarchar
20


PHONE_DAY
10
string
nvarchar
20

PHONE_FAX
11
string
nvarchar
20




We ran Sql trace to find out what exactly was going on with sqoop export ,
here are the analysis we did :


·        *High level explanation of the captured SQL statements generated
by the sqoop export*
o   The SQL statement uses bind variables, see details in
http://msdn.microsoft.com/en-us/expression/ff848812(v=sql.90).aspx
<https://connect.autodesk.com/owa/redir.aspx?C=J0oWQ6CjKkGswapMV0tR9kZEtNp949EI7iZxp3uGI6xOuSN_W7L3GZV00ant-378DMMUR9s6U90.&URL=http%3a%2f%2fmsdn.microsoft.com%2fen-us%2fexpression%2fff848812(v%3dsql.90).aspx>
o   The values to be assigned to the bind variables are not assigned by
reference but by position
o   The position is decided based on the data dictionary in the current
environment by column ordinal position
o   The column ordinal position does/may not match with the one in the Hive
tables
o   The generated statement is incomplete and does not follow the “best”
usage described in the Microsoft documentation


·        *Detailed explanation of the captured SQL statements generated by
the sqoop export*


o   Below the generated statement, beautified for human readability and on
the side the corrected version/comments


*original*
*fix*
*comments*
declare @p1 int
set @p1=1
exec sp_prepexec @p1 output,
N'
@P0 nvarchar(4000)
,@P1 nvarchar(4000)
,@P2 nvarchar(4000)
,@P3 nvarchar(4000)
,@P4 nvarchar(4000)
,@P5 nvarchar(4000)
,@P6 nvarchar(4000)
,@P7 nvarchar(4000)
,@P8 nvarchar(4000)
,@P9 nvarchar(4000)
,@P10 nvarchar(4000)
,@P11 int',
N'
UPDATE schema_name.table_name SET
 addr_1=@P0
,addr_2=@P1
,addr_3=@P2
,email=@P3
,email_alt=@P4
,first_name=@P5
,last_name=@P6
,middle_name=@P7
,phone_alt=@P8
,phone_day=@P9
,phone_fax=@P10
WHERE audience_member_id=@P11',
NULL
,N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH'
,N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH'
,N'n'
,N'n'
,N'BECBLJOEKALDEAMDEKOFFJMFJDMIDGIK@null.autodesk.com'
,N'KFCMHFECDFPHLPDHPIEEFBMOBMBDNDOM'
,N'n'
,N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH@null.autodesk.com'
,NULL
,N'NEBFMKNHCDLIMHGLFNIAFELDGGFJFIKB'
,1314617397
select @p1 as Handle
exec sp_unprepare 1


declare @p1 int;
--set @p1=1;
exec sp_prepexec @p1 output,
N'
@P0 nvarchar(4000)
,@P1 nvarchar(4000)
,@P2 nvarchar(4000)
,@P3 nvarchar(4000)
,@P4 nvarchar(4000)
,@P5 nvarchar(4000)
,@P6 nvarchar(4000)
,@P7 nvarchar(4000)
,@P8 nvarchar(4000)
,@P9 nvarchar(4000)
,@P10 nvarchar(4000)
,@P11 int',
N'
UPDATE schema_name.table_name SET
 addr_1=@P0
,addr_2=@P1
,addr_3=@P2
,email=@P3
,email_alt=@P4
,first_name=@P5
,last_name=@P6
,middle_name=@P7
,phone_alt=@P8
,phone_day=@P9
,phone_fax=@P10
WHERE audience_member_id=@P11',
@P1=NULL
,@P6=N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH'
,@P7=N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH'
,@P8=N'n'
,@P9=N'n'
,@P3=N'BECBLJOEKALDEAMDEKOFFJMFJDMIDGIK@null.autodesk.com'
,@P0=N'KFCMHFECDFPHLPDHPIEEFBMOBMBDNDOM'
,@P10=N'n'
,@P4=N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH@null.autodesk.com'
,@P5=NULL
,@P2=N'NEBFMKNHCDLIMHGLFNIAFELDGGFJFIKB'
,@P11=1314617397;
select @p1 as Handle;
exec sp_unprepare @p1;


if set @p1=1is used, we get this error: Could not find prepared statement
with handle 1.


Added ; to separate the statements (coding best practice when generating
dynamic sql)










































Assigned the correct bind variable to its value
This will allow us to be dynamic and specific.


















Thesp_unprepare1 makes no sense since the @p1 is the correct handle value


So what you can see here is the columns are scrambled when trying to export
from sqoop to sql server.

Can somebody let me know if this is a known issue or am i missing something
to debug this.



Thanks,
Sai

Mime
View raw message