sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachneet Singh Bains <sachneets.ba...@impetus.co.in>
Subject Sqoop import: Issue in extracting primary key for 'split-by'
Date Wed, 01 Apr 2015 08:29:42 GMT
Hi,

We are using Sqoop (1.4.5)to import tables from Teradata to Hadoop.
Came across an issue that was breaking the import every time. We debugged and resolved it.
Sharing as it may be of some help.

The Scenario and failure:
(I have renamed table and db for explanation)

*         Importing a table named TABLE1 (Primary Key - PK1) from database DB1

*         Not specifying the split by field in the import command.(Sqoop in this case fetches
the primary key and uses it as the split-by column)

*         The import fails every time saying PK2 does not exist in TABLE1 with the following
error
ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: com.teradata.jdbc.jdbc_4.util.JDBCException:
[Teradata Database] [TeraJDBC 14.10.00.26] [Error 5628] [SQLState HY000] Column PK2 not found
in DB1.TABLE1.
        at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:170)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
        at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)

The Issue and solution:
Issue: The  issue is that in the same Teradata DBMS there exists a table by the same name
TABLE1 (in different database DB2) with a different primary key -PK2.
Sqoop is using the primary key of this table instead of the correct one. Below code is the
problem:
org.apache.sqoop.manager.SqlManager.java (line number 559)
getPrimaryKeys(null, null, tableName) returns a result set of pk's for all the tables in the
DBMS and the code uses the first result without confirming the database it belongs to.
In our case the first result is from DB2 though DB1.TABLE1 was expected.
public String getPrimaryKey(String tableName) {
    try {
      DatabaseMetaData metaData = this.getConnection().getMetaData();
      ResultSet results = metaData.getPrimaryKeys(null, null, tableName);
      if (null == results) {
        return null;
      }

      try {
        if (results.next()) {
          return results.getString("COLUMN_NAME");
        } else {
          return null;
        }
Solution:

1.       If we use 'Split-by' explicitly in the import command this code won't be executed
and thus no issues.

2.       Using single mapper also avoids the issue as no splits are required in that case.

3.       Modify the code above to use the database name as well while fetching primary key.
Parse the database name from the URL and use it while getting the primary key:

ResultSet results = metaData.getPrimaryKeys(null, dbName, tableName);

Thanks,
Sachneet









________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

Mime
View raw message