sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jilani Shaik <jilani2...@gmail.com>
Subject Re: sqoop hbase incremental import - Sqoop 1.4.6
Date Wed, 01 Mar 2017 02:46:42 GMT
Hi Bogi,

Can you provide me sample Jira tickets and Review requests similar to this,
to proceed further.

I applied the code changes from sqoop git from this branch
"sqoop-release-1.4.6-rc0", If you suggest right branch I will take the code
from there and apply the changes before submit review for request.

Thanks,
Jilani

On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bogi@cloudera.com> wrote:

> Hi Jilani,
>
> To get your change committed please do the following:
> * Open a JIRA ticket for your change in Apache's JIRA system
> <https://issues.apache.org/jira/browse/SQOOP/> for project Sqoop
> * Create a review request at Apache's review board
> <https://reviews.apache.org/r/> for project Sqoop and link it to the JIRA
> ticket
>
> Please consider the guidelines below:
>
> Review board
> * Summary: generate your summary using the issue's jira key + jira title
> * Groups: add the relevant group so everyone on the project will know about
> your patch (Sqoop)
> * Bugs: add the issue's jira key so it's easy to navigate to the jira side
> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
> * And as soon as the patch gets committed, it's very useful for the
> community if you close the review and mark it as "Submitted" at the Review
> board. The button to do this is top right at your own tickets, right next
> to  the Download Diff button.
>
> Jira
> * Link: please add the link of the review as an external/web link so it's
> easy to navigate to the reviews side
> * Status: mark it as "patch available"
>
> Sqoop community will receive emails about your new ticket and review
> request and will review your change.
>
> Thanks,
> Bogi
>
>
> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <jilani2423@gmail.com>
> wrote:
>
> > Do we have any update?
> >
> > I did checkout of the 1.4.6 code and done code changes to achieve this
> and
> > tested in cluster and it is working as expected. Is there a way I can
> > contribute this as a patch and then the committers can validate further
> and
> > suggest if any changes required to move further. Please suggest the
> > approach.
> >
> > Thanks,
> > Jilani
> >
> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <jilani2423@gmail.com>
> > wrote:
> >
> > > Hi Liz,
> > >
> > > lets say we inserted data in a table with initial import, that looks
> like
> > > this in hbase shell
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:rental_id,
> > > timestamp=1485129654025, value=573
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=4526
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > > now assume that in source rental_id becomes NULL for rowkey "1", and
> then
> > > we are doing incremental import into HBase. With current import the
> final
> > > HBase data after incremental import will look like this.
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:rental_id,
> > > timestamp=1485129654025, value=573
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=126
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > >
> > > As source column "rental_id" becomes NULL for rowkey "1", the final
> HBase
> > > should not have the "rental_id" for this rowkey "1". I am expecting
> below
> > > data for these rowkeys.
> > >
> > >
> > >  1                                     column=pay:amount,
> > > timestamp=1485129654025, value=4.99
> > >  1                                     column=pay:customer_id,
> > > timestamp=1485129654025, value=1
> > >  1                                     column=pay:last_update,
> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
> > >  1                                     column=pay:payment_date,
> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
> > >  1                                     column=pay:staff_id,
> > > timestamp=1485129654025, value=1
> > >  10                                    column=pay:amount,
> > > timestamp=1485129504390, value=5.99
> > >  10                                    column=pay:customer_id,
> > > timestamp=1485129504390, value=1
> > >  10                                    column=pay:last_update,
> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
> > >  10                                    column=pay:payment_date,
> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
> > >  10                                    column=pay:rental_id,
> > > timestamp=1485129504390, value=126
> > >  10                                    column=pay:staff_id,
> > > timestamp=1485129504390, value=2
> > >
> > >
> > > Please let me know if anything required further.
> > >
> > >
> > > Thanks,
> > > Jilani
> > >
> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi <
> > > liz.szilagyi@cloudera.com> wrote:
> > >
> > >> Hi Jilani,
> > >> I'm not sure I completely understand what you are trying to do. Could
> > you
> > >> give us some examples with e.g. 4 columns and 2 rows of example data
> > >> showing the changes that happen compared to the changes you'd like to
> > see?
> > >> Thanks,
> > >> Liz
> > >>
> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik <jilani2423@gmail.com>
> > >> wrote:
> > >>
> > >> >
> > >> > Please help in resolving the issue, I am going through source code
> > some
> > >> > how the required nature is missing, But not sure is it for some
> reason
> > >> we
> > >> > avoided this nature.
> > >> >
> > >> > Provide me some suggestions how to go with this scenario.
> > >> >
> > >> > Thanks,
> > >> > Jilani
> > >> >
> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani Shaik <jilani2423@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> We have a scenario where we are importing data into HBase with
> sqoop
> > >> >> incremental import.
> > >> >>
> > >> >> Lets say we imported a table and later source table got updated
for
> > >> some
> > >> >> columns as null values for some rows. Then while doing incremental
> > >> import
> > >> >> as per HBase these columns should not be there in HBase table.
But
> > >> right
> > >> >> now these columns will be as it is available with previous values.
> > >> >>
> > >> >> Is there any fix to overcome this issue?
> > >> >>
> > >> >>
> > >> >> Thanks,
> > >> >> Jilani
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
View raw message