sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jilani Shaik <jilani2...@gmail.com>
Subject Re: sqoop hbase incremental import - Sqoop 1.4.6
Date Sat, 25 Feb 2017 01:14:46 GMT
Do we have any update?

I did checkout of the 1.4.6 code and done code changes to achieve this and
tested in cluster and it is working as expected. Is there a way I can
contribute this as a patch and then the committers can validate further and
suggest if any changes required to move further. Please suggest the
approach.

Thanks,
Jilani

On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <jilani2423@gmail.com> wrote:

> Hi Liz,
>
> lets say we inserted data in a table with initial import, that looks like
> this in hbase shell
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-01-23 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:rental_id,
> timestamp=1485129654025, value=573
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2006-02-15 22:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=4526
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
> now assume that in source rental_id becomes NULL for rowkey "1", and then
> we are doing incremental import into HBase. With current import the final
> HBase data after incremental import will look like this.
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-02-05 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:rental_id,
> timestamp=1485129654025, value=573
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2017-02-05 05:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=126
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
>
> As source column "rental_id" becomes NULL for rowkey "1", the final HBase
> should not have the "rental_id" for this rowkey "1". I am expecting below
> data for these rowkeys.
>
>
>  1                                     column=pay:amount,
> timestamp=1485129654025, value=4.99
>  1                                     column=pay:customer_id,
> timestamp=1485129654025, value=1
>  1                                     column=pay:last_update,
> timestamp=1485129654025, value=2017-02-05 05:29:09.0
>  1                                     column=pay:payment_date,
> timestamp=1485129654025, value=2005-05-25 11:30:37.0
>  1                                     column=pay:staff_id,
> timestamp=1485129654025, value=1
>  10                                    column=pay:amount,
> timestamp=1485129504390, value=5.99
>  10                                    column=pay:customer_id,
> timestamp=1485129504390, value=1
>  10                                    column=pay:last_update,
> timestamp=1485129504390, value=2017-02-05 05:12:30.0
>  10                                    column=pay:payment_date,
> timestamp=1485129504390, value=2005-07-08 03:17:05.0
>  10                                    column=pay:rental_id,
> timestamp=1485129504390, value=126
>  10                                    column=pay:staff_id,
> timestamp=1485129504390, value=2
>
>
> Please let me know if anything required further.
>
>
> Thanks,
> Jilani
>
> On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> liz.szilagyi@cloudera.com> wrote:
>
>> Hi Jilani,
>> I'm not sure I completely understand what you are trying to do. Could you
>> give us some examples with e.g. 4 columns and 2 rows of example data
>> showing the changes that happen compared to the changes you'd like to see?
>> Thanks,
>> Liz
>>
>> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <jilani2423@gmail.com>
>> wrote:
>>
>> >
>> > Please help in resolving the issue, I am going through source code some
>> > how the required nature is missing, But not sure is it for some reason
>> we
>> > avoided this nature.
>> >
>> > Provide me some suggestions how to go with this scenario.
>> >
>> > Thanks,
>> > Jilani
>> >
>> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <jilani2423@gmail.com>
>> > wrote:
>> >
>> >> Hi,
>> >>
>> >> We have a scenario where we are importing data into HBase with sqoop
>> >> incremental import.
>> >>
>> >> Lets say we imported a table and later source table got updated for
>> some
>> >> columns as null values for some rows. Then while doing incremental
>> import
>> >> as per HBase these columns should not be there in HBase table. But
>> right
>> >> now these columns will be as it is available with previous values.
>> >>
>> >> Is there any fix to overcome this issue?
>> >>
>> >>
>> >> Thanks,
>> >> Jilani
>> >>
>> >
>> >
>>
>
>

Mime
View raw message