Hi guys,
TO simplify my question, Let's say, I have a mysql table called 'student', looks like this:
+----+----------+-----+
| id | name | sex |
+----+----------+-----+
| 1 | Alice | 0 |
| 2 | Bob | 1 |
| 3 | Charles | 1 |
+----+----------+-----+
I want to import this table to HBase periodically which means I will run this sqoop job periodically.
There are two goals:
A. every time there is a new record inserted to mysql table, e.g. (4, David, 1), I hope my
next sqoop import will catch it and put it in HBase.
B. if there is any updates have been made to mysql rows 1, 2, 3, I want to have the updates
in HBase too after next round sqoop import.
I checked two types incremental updates sqoop has: Append mode seems only satisfied goal
A while Last-modified mode will require my mysql table has a timestamp column for each row(which
I don't in real life). I know if I don't use incremental updates options at all, I can just
get way with it by running a fresh import every time, but if my mysql table is really huge
and fresh import might be a performance killer.
Is there anyway I can just do incremental updates instead of having to re-run the whole import
to get NEW RECORDS + UPDATES ON OLD ROWS?
Shengjie
|