sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jilani Shaik <jilani2...@gmail.com>
Subject Re: sqoop hbase incremental import - Sqoop 1.4.6
Date Mon, 06 Feb 2017 04:41:25 GMT
Hi Liz,

lets say we inserted data in a table with initial import, that looks like
this in hbase shell

 1                                     column=pay:amount,
timestamp=1485129654025, value=4.99
 1                                     column=pay:customer_id,
timestamp=1485129654025, value=1
 1                                     column=pay:last_update,
timestamp=1485129654025, value=2017-01-23 05:29:09.0
 1                                     column=pay:payment_date,
timestamp=1485129654025, value=2005-05-25 11:30:37.0
 1                                     column=pay:rental_id,
timestamp=1485129654025, value=573
 1                                     column=pay:staff_id,
timestamp=1485129654025, value=1
 10                                    column=pay:amount,
timestamp=1485129504390, value=5.99
 10                                    column=pay:customer_id,
timestamp=1485129504390, value=1
 10                                    column=pay:last_update,
timestamp=1485129504390, value=2006-02-15 22:12:30.0
 10                                    column=pay:payment_date,
timestamp=1485129504390, value=2005-07-08 03:17:05.0
 10                                    column=pay:rental_id,
timestamp=1485129504390, value=4526
 10                                    column=pay:staff_id,
timestamp=1485129504390, value=2


now assume that in source rental_id becomes NULL for rowkey "1", and then
we are doing incremental import into HBase. With current import the final
HBase data after incremental import will look like this.

 1                                     column=pay:amount,
timestamp=1485129654025, value=4.99
 1                                     column=pay:customer_id,
timestamp=1485129654025, value=1
 1                                     column=pay:last_update,
timestamp=1485129654025, value=2017-02-05 05:29:09.0
 1                                     column=pay:payment_date,
timestamp=1485129654025, value=2005-05-25 11:30:37.0
 1                                     column=pay:rental_id,
timestamp=1485129654025, value=573
 1                                     column=pay:staff_id,
timestamp=1485129654025, value=1
 10                                    column=pay:amount,
timestamp=1485129504390, value=5.99
 10                                    column=pay:customer_id,
timestamp=1485129504390, value=1
 10                                    column=pay:last_update,
timestamp=1485129504390, value=2017-02-05 05:12:30.0
 10                                    column=pay:payment_date,
timestamp=1485129504390, value=2005-07-08 03:17:05.0
 10                                    column=pay:rental_id,
timestamp=1485129504390, value=126
 10                                    column=pay:staff_id,
timestamp=1485129504390, value=2



As source column "rental_id" becomes NULL for rowkey "1", the final HBase
should not have the "rental_id" for this rowkey "1". I am expecting below
data for these rowkeys.


 1                                     column=pay:amount,
timestamp=1485129654025, value=4.99
 1                                     column=pay:customer_id,
timestamp=1485129654025, value=1
 1                                     column=pay:last_update,
timestamp=1485129654025, value=2017-02-05 05:29:09.0
 1                                     column=pay:payment_date,
timestamp=1485129654025, value=2005-05-25 11:30:37.0
 1                                     column=pay:staff_id,
timestamp=1485129654025, value=1
 10                                    column=pay:amount,
timestamp=1485129504390, value=5.99
 10                                    column=pay:customer_id,
timestamp=1485129504390, value=1
 10                                    column=pay:last_update,
timestamp=1485129504390, value=2017-02-05 05:12:30.0
 10                                    column=pay:payment_date,
timestamp=1485129504390, value=2005-07-08 03:17:05.0
 10                                    column=pay:rental_id,
timestamp=1485129504390, value=126
 10                                    column=pay:staff_id,
timestamp=1485129504390, value=2


Please let me know if anything required further.


Thanks,
Jilani

On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
liz.szilagyi@cloudera.com> wrote:

> Hi Jilani,
> I'm not sure I completely understand what you are trying to do. Could you
> give us some examples with e.g. 4 columns and 2 rows of example data
> showing the changes that happen compared to the changes you'd like to see?
> Thanks,
> Liz
>
> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <jilani2423@gmail.com>
> wrote:
>
> >
> > Please help in resolving the issue, I am going through source code some
> > how the required nature is missing, But not sure is it for some reason we
> > avoided this nature.
> >
> > Provide me some suggestions how to go with this scenario.
> >
> > Thanks,
> > Jilani
> >
> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <jilani2423@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> We have a scenario where we are importing data into HBase with sqoop
> >> incremental import.
> >>
> >> Lets say we imported a table and later source table got updated for some
> >> columns as null values for some rows. Then while doing incremental
> import
> >> as per HBase these columns should not be there in HBase table. But right
> >> now these columns will be as it is available with previous values.
> >>
> >> Is there any fix to overcome this issue?
> >>
> >>
> >> Thanks,
> >> Jilani
> >>
> >
> >
>

Mime
View raw message