sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shengjie min <kelvin....@gmail.com>
Subject Re: incremental updates mysql to HBase
Date Wed, 07 Aug 2013 16:12:08 GMT


> - Add an update_time column to the source table and do the incremental load by that update_time
column.
two questions to that then:

1. will sqoop job updates that field "update_time" automatically when it runs OR my application
needs to write that field?
2. What if I don't wanna or I can't  touch the schema on the source table?

THanks,
Shengjie

On 8 Aug 2013, at 00:06, Joanne Chan <jchan@shutterstock.com> wrote:

> Add an update_time column to the source table and do the incremental load by that update_time
column.
> 
> 
> On Wed, Aug 7, 2013 at 12:04 PM, shengjie min <kelvin.msj@gmail.com> wrote:
> Hi guys,
> 
> TO simplify my question, Let's say, I have a mysql table called 'student', looks like
this:
> 
> +----+----------+-----+
> | id | name     | sex |
> +----+----------+-----+
> |  1 | Alice       |   0  |
> |  2 | Bob         |   1  |
> |  3 | Charles  |   1  |
> +----+----------+-----+
> 
> I want to import this table to HBase periodically which means I will run this sqoop job
periodically. There are two goals:
> 
> A.  every time there is a new record inserted to mysql table, e.g. (4, David, 1), I hope
my next sqoop import will catch it and put it in HBase.
> B. if  there is any updates have been made to mysql rows 1, 2, 3, I want to have the
updates in HBase too after next round sqoop import.
> 
> I checked two types incremental updates sqoop has:  Append mode seems only satisfied
goal A while Last-modified mode will require my mysql table has a timestamp column for each
row(which I don't in real life). I know if I don't use incremental updates options at all,
I can just get way with it by running a fresh import every time, but if my mysql table is
really huge and fresh import might be a performance killer.
> 
> Is there anyway I can just do incremental updates instead of having to re-run the whole
import to get NEW RECORDS + UPDATES ON OLD ROWS?
> 
> 
> Shengjie
> 
> 
> 
> -- 
> -- JChan
> 


Mime
View raw message