sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sai chaitanya tirumerla <tirume...@gmail.com>
Subject Re: Sqoop Export to sql server bugs
Date Thu, 04 Dec 2014 22:33:32 GMT
Further to my analysis,

Here is the information from verbose option.
The columns selected is not matching the columns write order which is
messing up the data export from sqoop.

14/12/04 22:19:09 DEBUG orm.ClassWriter: selected columns:

14/12/04 22:19:09 DEBUG orm.ClassWriter:   audience_member_id

14/12/04 22:19:09 DEBUG orm.ClassWriter:   addr_1

14/12/04 22:19:09 DEBUG orm.ClassWriter:   addr_2

14/12/04 22:19:09 DEBUG orm.ClassWriter:   addr_3

14/12/04 22:19:09 DEBUG orm.ClassWriter:   email

14/12/04 22:19:09 DEBUG orm.ClassWriter:   email_alt

14/12/04 22:19:09 DEBUG orm.ClassWriter:   first_name

14/12/04 22:19:09 DEBUG orm.ClassWriter:   last_name

14/12/04 22:19:09 DEBUG orm.ClassWriter:   middle_name

14/12/04 22:19:09 DEBUG orm.ClassWriter:   phone_alt

14/12/04 22:19:09 DEBUG orm.ClassWriter:   phone_day

14/12/04 22:19:09 DEBUG orm.ClassWriter:   phone_fax

14/12/04 22:19:09 DEBUG orm.ClassWriter: db write column order:

14/12/04 22:19:09 DEBUG orm.ClassWriter:   first_name

14/12/04 22:19:09 DEBUG orm.ClassWriter:   middle_name

14/12/04 22:19:09 DEBUG orm.ClassWriter:   last_name

14/12/04 22:19:09 DEBUG orm.ClassWriter:   phone_day

14/12/04 22:19:09 DEBUG orm.ClassWriter:   phone_fax

14/12/04 22:19:09 DEBUG orm.ClassWriter:   email

14/12/04 22:19:09 DEBUG orm.ClassWriter:   addr_1

14/12/04 22:19:09 DEBUG orm.ClassWriter:   phone_alt

14/12/04 22:19:09 DEBUG orm.ClassWriter:   email_alt

14/12/04 22:19:09 DEBUG orm.ClassWriter:   addr_2

14/12/04 22:19:09 DEBUG orm.ClassWriter:   addr_3

14/12/04 22:19:09 DEBUG orm.ClassWriter:   audience_member_id

As you can see the selected columns is different from write column order
and the data export is scrambled.


Attaching screenshot

[image: Inline image 1]



Thanks,

Sai

On Thu, Dec 4, 2014 at 9:59 AM, sai chaitanya tirumerla <tirumerla@gmail.com
> wrote:

> Hi,
>
> We have found several bugs while doing sqoop export from s3 to sql server
> , not sure if there were any patches released for these.
>
> Sqoop command used to export the data :
>
> sqoop export \
> --driver  com.microsoft.sqlserver.jdbc.SQLServerDriver \
> --connect “<connect string>" \
> --username “<username>" \
> --password “<password>" \
> --update-key audience_member_id \
> --table “schema_name.tablename" \
> --columns
> audience_member_id,addr_1,addr_2,addr_3,email,email_alt,first_name,last_name,middle_name,phone_alt,phone_day,phone_fax
> \
> --export-dir “<s3 path>" \
> --escaped-by '\\' --enclosed-by '\"' --fields-terminated-by ‘,'
>
>
> Here are the datatypes of the both target table and also source table (
> hive ) :
>
>
>
> *APPLICATION_LOGICAL_NAME*
> *SQLSRV_DATABASE_NAME*
> *SCHEMA_NAME*
> *TABLE_NAME*
> APR
> BRK
> abc
> audience
>
>
> *COLUMN_NAME*
> *COLUMN_POSITION*
> *DATA_TYPE_HIVE*
> *DATA_TYPE_TARGET*
> *DATA_MAX_SIZE*
>
> *NOTES*
> ADDR_1
> 1
> string
> nvarchar
> 75
>
> ADDR_2
> 2
> string
> nvarchar
> 75
>
> ADDR_3
> 3
> string
> nvarchar
> 75
>
>
> EMAIL
> 4
> string
> nvarchar
> 255
>
>
> EMAIL_ALT
> 5
> string
> nvarchar
> 255
>
>
> FIRST_NAME
> 6
> string
> nvarchar
> 75
>
>
> LAST_NAME
> 7
> string
> nvarchar
> 75
>
> MIDDLE_NAME
> 8
> string
> nvarchar
> 75
>
>
> PHONE_ALT
> 9
> string
> nvarchar
> 20
>
>
> PHONE_DAY
> 10
> string
> nvarchar
> 20
>
> PHONE_FAX
> 11
> string
> nvarchar
> 20
>
>
>
>
> We ran Sql trace to find out what exactly was going on with sqoop export ,
> here are the analysis we did :
>
>
> ·        *High level explanation of the captured SQL statements generated
> by the sqoop export*
> o   The SQL statement uses bind variables, see details in
> http://msdn.microsoft.com/en-us/expression/ff848812(v=sql.90).aspx
> <https://connect.autodesk.com/owa/redir.aspx?C=J0oWQ6CjKkGswapMV0tR9kZEtNp949EI7iZxp3uGI6xOuSN_W7L3GZV00ant-378DMMUR9s6U90.&URL=http%3a%2f%2fmsdn.microsoft.com%2fen-us%2fexpression%2fff848812(v%3dsql.90).aspx>
> o   The values to be assigned to the bind variables are not assigned by
> reference but by position
> o   The position is decided based on the data dictionary in the current
> environment by column ordinal position
> o   The column ordinal position does/may not match with the one in the
> Hive tables
> o   The generated statement is incomplete and does not follow the “best”
> usage described in the Microsoft documentation
>
>
> ·        *Detailed explanation of the captured SQL statements generated
> by the sqoop export*
>
>
> o   Below the generated statement, beautified for human readability and
> on the side the corrected version/comments
>
>
> *original*
> *fix*
> *comments*
> declare @p1 int
> set @p1=1
> exec sp_prepexec @p1 output,
> N'
> @P0 nvarchar(4000)
> ,@P1 nvarchar(4000)
> ,@P2 nvarchar(4000)
> ,@P3 nvarchar(4000)
> ,@P4 nvarchar(4000)
> ,@P5 nvarchar(4000)
> ,@P6 nvarchar(4000)
> ,@P7 nvarchar(4000)
> ,@P8 nvarchar(4000)
> ,@P9 nvarchar(4000)
> ,@P10 nvarchar(4000)
> ,@P11 int',
> N'
> UPDATE schema_name.table_name SET
>  addr_1=@P0
> ,addr_2=@P1
> ,addr_3=@P2
> ,email=@P3
> ,email_alt=@P4
> ,first_name=@P5
> ,last_name=@P6
> ,middle_name=@P7
> ,phone_alt=@P8
> ,phone_day=@P9
> ,phone_fax=@P10
> WHERE audience_member_id=@P11',
> NULL
> ,N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH'
> ,N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH'
> ,N'n'
> ,N'n'
> ,N'BECBLJOEKALDEAMDEKOFFJMFJDMIDGIK@null.autodesk.com'
> ,N'KFCMHFECDFPHLPDHPIEEFBMOBMBDNDOM'
> ,N'n'
> ,N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH@null.autodesk.com'
> ,NULL
> ,N'NEBFMKNHCDLIMHGLFNIAFELDGGFJFIKB'
> ,1314617397
> select @p1 as Handle
> exec sp_unprepare 1
>
>
> declare @p1 int;
> --set @p1=1;
> exec sp_prepexec @p1 output,
> N'
> @P0 nvarchar(4000)
> ,@P1 nvarchar(4000)
> ,@P2 nvarchar(4000)
> ,@P3 nvarchar(4000)
> ,@P4 nvarchar(4000)
> ,@P5 nvarchar(4000)
> ,@P6 nvarchar(4000)
> ,@P7 nvarchar(4000)
> ,@P8 nvarchar(4000)
> ,@P9 nvarchar(4000)
> ,@P10 nvarchar(4000)
> ,@P11 int',
> N'
> UPDATE schema_name.table_name SET
>  addr_1=@P0
> ,addr_2=@P1
> ,addr_3=@P2
> ,email=@P3
> ,email_alt=@P4
> ,first_name=@P5
> ,last_name=@P6
> ,middle_name=@P7
> ,phone_alt=@P8
> ,phone_day=@P9
> ,phone_fax=@P10
> WHERE audience_member_id=@P11',
> @P1=NULL
> ,@P6=N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH'
> ,@P7=N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH'
> ,@P8=N'n'
> ,@P9=N'n'
> ,@P3=N'BECBLJOEKALDEAMDEKOFFJMFJDMIDGIK@null.autodesk.com'
> ,@P0=N'KFCMHFECDFPHLPDHPIEEFBMOBMBDNDOM'
> ,@P10=N'n'
> ,@P4=N'FCGFFADPCAECIJHNDNJCNGMPKIGKFMFH@null.autodesk.com'
> ,@P5=NULL
> ,@P2=N'NEBFMKNHCDLIMHGLFNIAFELDGGFJFIKB'
> ,@P11=1314617397;
> select @p1 as Handle;
> exec sp_unprepare @p1;
>
>
> if set @p1=1is used, we get this error: Could not find prepared statement
> with handle 1.
>
>
> Added ; to separate the statements (coding best practice when generating
> dynamic sql)
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Assigned the correct bind variable to its value
> This will allow us to be dynamic and specific.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Thesp_unprepare1 makes no sense since the @p1 is the correct handle value
>
>
> So what you can see here is the columns are scrambled when trying to
> export from sqoop to sql server.
>
> Can somebody let me know if this is a known issue or am i missing
> something to debug this.
>
>
>
> Thanks,
> Sai
>
>
>
>

Mime
View raw message