sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naresh AR (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-3436) sqoop imports data from oracle exadata has duplicates
Date Mon, 22 Apr 2019 19:35:00 GMT
Naresh AR created SQOOP-3436:
--------------------------------

             Summary: sqoop imports data from oracle exadata has duplicates
                 Key: SQOOP-3436
                 URL: https://issues.apache.org/jira/browse/SQOOP-3436
             Project: Sqoop
          Issue Type: Bug
          Components: sqoop2-build, sqoop2-jdbc-connector
    Affects Versions: 1.4.7
         Environment: sqoop1.4.7,hortonworks2.6.3
            Reporter: Naresh AR


Hi I have used sqoop with oracle exadata which results in complete row duplicate ,at present
we are removing using the distinct query and dumping into another target table,Please suggest
on this

Background for oracle table :

Oracle used for sqoop import have no primary keys involved (i.e) tables are of scd type2 and
have complex keys as primary keys which does not suit split by option and tables are very
huge(100gig)

Command used for sqoop import from oracle exadata

sqoop import --connect %s@//%s:%s/%s --username %s -password %s --table %s.%s --fields-terminated-by
'%s' --hive-drop-import-delims --hive-import --hive-overwrite --hive-table %s.%s --null-string
'\\\N' --null-non-string '\\\N' --m %s --fetch-size=2500



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message