sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boglarka Egyed <b...@cloudera.com>
Subject Re: sqoop hbase incremental import - Sqoop 1.4.6
Date Mon, 27 Feb 2017 09:05:29 GMT
Hi Jilani,

To get your change committed please do the following:
* Open a JIRA ticket for your change in Apache's JIRA system
<https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
* Create a review request at Apache's review board
<https://reviews.apache.org/r/> for project Sqoop and link it to the JIRA
ticket

Please consider the guidelines below:

Review board
* Summary: generate your summary using the issue's jira key + jira title
* Groups: add the relevant group so everyone on the project will know about
your patch (Sqoop)
* Bugs: add the issue's jira key so it's easy to navigate to the jira side
* Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
* And as soon as the patch gets committed, it's very useful for the
community if you close the review and mark it as "Submitted" at the Review
board. The button to do this is top right at your own tickets, right next
to  the Download Diff button.

Jira
* Link: please add the link of the review as an external/web link so it's
easy to navigate to the reviews side
* Status: mark it as "patch available"

Sqoop community will receive emails about your new ticket and review
request and will review your change.

Thanks,
Bogi


On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <jilani2423@gmail.com> wrote:

> Do we have any update?
>
> I did checkout of the 1.4.6 code and done code changes to achieve this and
> tested in cluster and it is working as expected. Is there a way I can
> contribute this as a patch and then the committers can validate further and
> suggest if any changes required to move further. Please suggest the
> approach.
>
> Thanks,
> Jilani
>
> On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <jilani2423@gmail.com>
> wrote:
>
> > Hi Liz,
> >
> > lets say we inserted data in a table with initial import, that looks like
> > this in hbase shell
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:rental_id,
> > timestamp=1485129654025, value=573
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=4526
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> > now assume that in source rental_id becomes NULL for rowkey "1", and then
> > we are doing incremental import into HBase. With current import the final
> > HBase data after incremental import will look like this.
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:rental_id,
> > timestamp=1485129654025, value=573
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=126
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> >
> > As source column "rental_id" becomes NULL for rowkey "1", the final HBase
> > should not have the "rental_id" for this rowkey "1". I am expecting below
> > data for these rowkeys.
> >
> >
> >  1                                     column=pay:amount,
> > timestamp=1485129654025, value=4.99
> >  1                                     column=pay:customer_id,
> > timestamp=1485129654025, value=1
> >  1                                     column=pay:last_update,
> > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> >  1                                     column=pay:payment_date,
> > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> >  1                                     column=pay:staff_id,
> > timestamp=1485129654025, value=1
> >  10                                    column=pay:amount,
> > timestamp=1485129504390, value=5.99
> >  10                                    column=pay:customer_id,
> > timestamp=1485129504390, value=1
> >  10                                    column=pay:last_update,
> > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> >  10                                    column=pay:payment_date,
> > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> >  10                                    column=pay:rental_id,
> > timestamp=1485129504390, value=126
> >  10                                    column=pay:staff_id,
> > timestamp=1485129504390, value=2
> >
> >
> > Please let me know if anything required further.
> >
> >
> > Thanks,
> > Jilani
> >
> > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> > liz.szilagyi@cloudera.com> wrote:
> >
> >> Hi Jilani,
> >> I'm not sure I completely understand what you are trying to do. Could
> you
> >> give us some examples with e.g. 4 columns and 2 rows of example data
> >> showing the changes that happen compared to the changes you'd like to
> see?
> >> Thanks,
> >> Liz
> >>
> >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <jilani2423@gmail.com>
> >> wrote:
> >>
> >> >
> >> > Please help in resolving the issue, I am going through source code
> some
> >> > how the required nature is missing, But not sure is it for some reason
> >> we
> >> > avoided this nature.
> >> >
> >> > Provide me some suggestions how to go with this scenario.
> >> >
> >> > Thanks,
> >> > Jilani
> >> >
> >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <jilani2423@gmail.com>
> >> > wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> We have a scenario where we are importing data into HBase with sqoop
> >> >> incremental import.
> >> >>
> >> >> Lets say we imported a table and later source table got updated for
> >> some
> >> >> columns as null values for some rows. Then while doing incremental
> >> import
> >> >> as per HBase these columns should not be there in HBase table. But
> >> right
> >> >> now these columns will be as it is available with previous values.
> >> >>
> >> >> Is there any fix to overcome this issue?
> >> >>
> >> >>
> >> >> Thanks,
> >> >> Jilani
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message