sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Szabolcs Vasas (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SQOOP-2986) Add validation check for --hive-import and --incremental lastmodified
Date Tue, 26 Jul 2016 13:45:20 GMT
Szabolcs Vasas created SQOOP-2986:
-------------------------------------

             Summary: Add validation check for --hive-import and --incremental lastmodified
                 Key: SQOOP-2986
                 URL: https://issues.apache.org/jira/browse/SQOOP-2986
             Project: Sqoop
          Issue Type: Bug
    Affects Versions: 1.4.6
            Reporter: Szabolcs Vasas
            Assignee: Szabolcs Vasas
             Fix For: 1.4.7


Sqoop import with --hive-import and --incremental lastmodified options is not supported, however
the application is able to run with these parameters but it produces unexpected results, the
output can contain duplicate rows.

Steps to reproduce the issue:

1) Create the necessary table for example in MySQL:

CREATE TABLE "employees" (
  "id" int(11) NOT NULL,
  "name" varchar(45) DEFAULT NULL,
  "salary" varchar(45) DEFAULT NULL,
  "change_date" datetime DEFAULT NULL,
  PRIMARY KEY ("id")
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES (1,'employee1',1000,now());

INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES (2,'employee2','2000',now());

INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES (3,'employee3','3000',now());

INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES (4,'employee4','4000',now());

INSERT INTO `Employees` (`id`,`name`,`salary`,`change_date`) VALUES (5,'employee5','5000',now());

2) Import the table to Hive

sudo -u hdfs sqoop import --connect jdbc:mysql://servername:3306/sqoop --username sqoop --password
sqoop --table Employees --num-mappers 1 --hive-import --hive-table Employees 

3) Update some rows in MySQL:

UPDATE Employees SET salary=1010, change_date=now() where id=1;
UPDATE Employees SET salary=2010, change_date=now() where id=2;

4) Execute the incremental import command:

sudo -u hdfs sqoop import --verbose --connect jdbc:mysql://servername:3306/sqoop --username
sqoop --password sqoop --table Employees --incremental lastmodified --check-column change_date
--merge-key id --num-mappers 1 --hive-import --hive-table Employees --last-value "last_timestamp"

5) As a result employees with ids 1 and 2 will not be updated but we will see duplicate rows
in the Hive table.


The task is to introduce a fail-fast validation which will make the Sqoop import fail if it
was submitted with --hive-import and --incremental lastmodified options.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message