sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yulei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-3263) Duplicate rows found when split-by column is of textual type due to different charset difference of sqoop and hadoop
Date Sun, 26 Nov 2017 14:57:00 GMT
Yulei Yang created SQOOP-3263:
---------------------------------

             Summary: Duplicate rows found when split-by column is of textual type due to
different charset difference of sqoop and hadoop
                 Key: SQOOP-3263
                 URL: https://issues.apache.org/jira/browse/SQOOP-3263
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.4.6
            Reporter: Yulei Yang


This is issue can be found in any kind of RMDBS, because the root cause is not on RMDBS. Steps
to reproduce this issue:
1. create a mysql table: create table ora_test (id varchar(32) primary key not null);
2.  insert *4* rows:
insert into ora_test values ('08125FC4C8FDA064E053C0A8028DA064');
insert into ora_test values ('4FFE68419D3502E2E0537F000001F3E8');
insert into ora_test values ('4FFF9CF5861E003EE0537F0000017FF7');
insert into ora_test values ('56DAC2D0F14901B0E0537F000001D3FA');
3. import it to hive with sqoop import -m 32. (m=189 is also ok)。 Then you will get *6*
rows in hive.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message