sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jilani Shaik <jilani2...@gmail.com>
Subject Re: sqoop hbase incremental import - Sqoop 1.4.6
Date Fri, 10 Mar 2017 05:10:15 GMT
Hi Bogi,

Thanks for the providing direction.

As you suggested I explored further and resolved the issue and able to test
the fix on trunk based code changes in my hadoop cluster.

Root cause for my issue:
1.4.6 code base using the same avro version which is there in my hadoop
cluster so there is no issue for that jar component, whereas trunk code
base using the avro-1.8.1 jar files, which is not available in my hadoop
cluster.

Can you suggest how to do unit test etc for this component.

I tried with "test" target, I am getting all as failed as below.

Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.415 sec
    [junit] Running com.cloudera.sqoop.TestDirectImport
    [junit] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
13.705 sec
    [junit] Test com.cloudera.sqoop.TestDirectImport FAILED
    [junit] Running com.cloudera.sqoop.TestExport
    [junit] Tests run: 17, Failures: 0, Errors: 17, Skipped: 0, Time
elapsed: 22.564 sec
    [junit] Test com.cloudera.sqoop.TestExport FAILED
    [junit] Running com.cloudera.sqoop.TestExportUpdate

Do I need to do any changes? I am running from eclipse with "test" target.

Thanks,
Jilani


On Thu, Mar 9, 2017 at 9:42 PM, Jilani Shaik <jilani2423@gmail.com> wrote:

> Hi Bogi,
>
> - Prepared jar using trunk with "jar-all" target
>
> - Copied the jar to /opt/mapr/sqoop/sqoop-1.4.6/
>
> - Moved out existing jar to some other location
>
> - then execute the below command to do import
> sqoop import --connect jdbc:mysql://10.0.0.300/database123 --verbose
> --username test --password test123$ --table payment -m 2 --hbase-table
> /database/demoapp/hbase/payment --column-family pay --hbase-row-key
> payment_id --incremental lastmodified --merge-key payment_id --check-column
> last_update --last-value '2017-01-08 08:02:05.0'
>
>
> The same steps I followed for both the jar from trunk code vs 1.4.6 branch
> code.
>
> Where are you suggesting the multiple avro jars, is it at the time of jar
> preparation or running the command using the jar.
>
>
> Thanks,
> Jilani
>
> On Thu, Mar 9, 2017 at 9:21 AM, Boglarka Egyed <bogi@cloudera.com> wrote:
>
>> Hi Jilani,
>>
>> I suspect that you have an old version of Avro or even multiple Avro
>> versions on your classpath and thus Sqoop uses an older one.
>>
>> Could you please provide a list of the exact commands you have performed
>> so that I can reproduce the issue?
>>
>> Thanks,
>> Bogi
>>
>> On Thu, Mar 9, 2017 at 2:51 AM, Jilani Shaik <jilani2423@gmail.com>
>> wrote:
>>
>>> Can some one provide me the pointers what am I missing with trunk vs
>>> 1.4.6
>>> builds, which is giving some error as mentioned in below mail chain.
>>>
>>> I did followed the same ant target to prepare jar for both branches, but
>>> even though 1.4.6 jar is different to 1.4.7 which is created form trunk.
>>>
>>> Thanks,
>>> Jilani
>>>
>>>
>>> On Wed, Mar 8, 2017 at 3:29 AM, Jilani Shaik <jilani2423@gmail.com>
>>> wrote:
>>>
>>> > Hi Bogi,
>>> >
>>> > I am getting below error, when I have prepared jar from trunk and try
>>> to
>>> > do sqoop import with mysql database table and got below exception,
>>> where as
>>> > similar changes are working with branch 1.4.6.
>>> >
>>> >
>>> > 17/03/08 01:06:25 INFO sqoop.Sqoop: Running Sqoop version:
>>> 1.4.7-SNAPSHOT
>>> > 17/03/08 01:06:25 DEBUG tool.BaseSqoopTool: Enabled debug logging.
>>> > 17/03/08 01:06:25 WARN tool.BaseSqoopTool: Setting your password on the
>>> > command-line is insecure. Consider using -P instead.
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Loaded manager factory:
>>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> > org.apache.sqoop.manager.oracle.OraOopManagerFactory
>>> > 17/03/08 01:06:25 DEBUG oracle.OraOopManagerFactory: Data Connector for
>>> > Oracle and Hadoop can be called by Sqoop!
>>> > 17/03/08 01:06:25 DEBUG sqoop.ConnFactory: Trying ManagerFactory:
>>> > com.cloudera.sqoop.manager.DefaultManagerFactory
>>> > 17/03/08 01:06:25 DEBUG manager.DefaultManagerFactory: Trying with
>>> scheme:
>>> > jdbc:mysql:
>>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>> > org/apache/avro/LogicalType
>>> >         at org.apache.sqoop.manager.DefaultManagerFactory.accept(
>>> > DefaultManagerFactory.java:67)
>>> >         at org.apache.sqoop.ConnFactory.g
>>> etManager(ConnFactory.java:184)
>>> >         at org.apache.sqoop.tool.BaseSqoopTool.init(
>>> > BaseSqoopTool.java:270)
>>> >         at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:97)
>>> >         at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:617)
>>> >         at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
>>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>> >         at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
>>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
>>> >         at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
>>> >         at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
>>> > Caused by: java.lang.ClassNotFoundException:
>>> org.apache.avro.LogicalType
>>> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>> >         at sun.misc.Launcher$AppClassLoad
>>> er.loadClass(Launcher.java:331)
>>> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>> >         ... 11 more
>>> >
>>> > Please let me know what is missing and how to resolve this exception,
>>> Let
>>> > me know if you need further details.
>>> >
>>> > Thanks,
>>> > Jilani
>>> >
>>> > On Wed, Mar 1, 2017 at 4:38 AM, Boglarka Egyed <bogi@cloudera.com>
>>> wrote:
>>> >
>>> >> Hi Jilani,
>>> >>
>>> >> This is an example: SQOOP-3053
>>> >> <https://issues.apache.org/jira/browse/SQOOP-3053> with the review
>>> >> <https://reviews.apache.org/r/54206/> linked. Please make your
>>> changes on
>>> >> trunk as it will be used to cut the future release so your patch
>>> >> definitely
>>> >> needs to be be able to apply on it.
>>> >>
>>> >> Thanks,
>>> >> Bogi
>>> >>
>>> >> On Wed, Mar 1, 2017 at 3:46 AM, Jilani Shaik <jilani2423@gmail.com>
>>> >> wrote:
>>> >>
>>> >> > Hi Bogi,
>>> >> >
>>> >> > Can you provide me sample Jira tickets and Review requests similar
>>> to
>>> >> > this, to proceed further.
>>> >> >
>>> >> > I applied the code changes from sqoop git from this branch
>>> >> > "sqoop-release-1.4.6-rc0", If you suggest right branch I will take
>>> the
>>> >> code
>>> >> > from there and apply the changes before submit review for request.
>>> >> >
>>> >> > Thanks,
>>> >> > Jilani
>>> >> >
>>> >> > On Mon, Feb 27, 2017 at 3:05 AM, Boglarka Egyed <bogi@cloudera.com>
>>> >> wrote:
>>> >> >
>>> >> >> Hi Jilani,
>>> >> >>
>>> >> >> To get your change committed please do the following:
>>> >> >> * Open a JIRA ticket for your change in Apache's JIRA system
>>> >> >> <https://issues.apache.org/jira/browse/SQOOP/> for project
Sqoop
>>> >> >> * Create a review request at Apache's review board
>>> >> >> <https://reviews.apache.org/r/> for project Sqoop and
link it to
>>> the
>>> >> JIRA
>>> >> >>
>>> >> >> ticket
>>> >> >>
>>> >> >> Please consider the guidelines below:
>>> >> >>
>>> >> >> Review board
>>> >> >> * Summary: generate your summary using the issue's jira key
+ jira
>>> >> title
>>> >> >> * Groups: add the relevant group so everyone on the project
will
>>> know
>>> >> >> about
>>> >> >> your patch (Sqoop)
>>> >> >> * Bugs: add the issue's jira key so it's easy to navigate to
the
>>> jira
>>> >> side
>>> >> >> * Repository: sqoop-trunk for Sqoop1 or sqoop-sqoop2 for Sqoop2
>>> >> >> * And as soon as the patch gets committed, it's very useful
for the
>>> >> >> community if you close the review and mark it as "Submitted"
at the
>>> >> Review
>>> >> >> board. The button to do this is top right at your own tickets,
>>> right
>>> >> next
>>> >> >> to  the Download Diff button.
>>> >> >>
>>> >> >> Jira
>>> >> >> * Link: please add the link of the review as an external/web
link
>>> so
>>> >> it's
>>> >> >> easy to navigate to the reviews side
>>> >> >> * Status: mark it as "patch available"
>>> >> >>
>>> >> >> Sqoop community will receive emails about your new ticket and
>>> review
>>> >> >> request and will review your change.
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Bogi
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Feb 25, 2017 at 2:14 AM, Jilani Shaik <
>>> jilani2423@gmail.com>
>>> >> >> wrote:
>>> >> >>
>>> >> >> > Do we have any update?
>>> >> >> >
>>> >> >> > I did checkout of the 1.4.6 code and done code changes
to achieve
>>> >> this
>>> >> >> and
>>> >> >> > tested in cluster and it is working as expected. Is there
a way
>>> I can
>>> >> >> > contribute this as a patch and then the committers can
validate
>>> >> further
>>> >> >> and
>>> >> >> > suggest if any changes required to move further. Please
suggest
>>> the
>>> >> >> > approach.
>>> >> >> >
>>> >> >> > Thanks,
>>> >> >> > Jilani
>>> >> >> >
>>> >> >> > On Sun, Feb 5, 2017 at 10:41 PM, Jilani Shaik <
>>> jilani2423@gmail.com>
>>> >> >> > wrote:
>>> >> >> >
>>> >> >> > > Hi Liz,
>>> >> >> > >
>>> >> >> > > lets say we inserted data in a table with initial
import, that
>>> >> looks
>>> >> >> like
>>> >> >> > > this in hbase shell
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-01-23 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:rental_id,
>>> >> >> > > timestamp=1485129654025, value=573
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2006-02-15 22:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=4526
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > now assume that in source rental_id becomes NULL
for rowkey
>>> "1",
>>> >> and
>>> >> >> then
>>> >> >> > > we are doing incremental import into HBase. With
current
>>> import the
>>> >> >> final
>>> >> >> > > HBase data after incremental import will look like
this.
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:rental_id,
>>> >> >> > > timestamp=1485129654025, value=573
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=126
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > As source column "rental_id" becomes NULL for rowkey
"1", the
>>> final
>>> >> >> HBase
>>> >> >> > > should not have the "rental_id" for this rowkey "1".
I am
>>> expecting
>>> >> >> below
>>> >> >> > > data for these rowkeys.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >  1                                     column=pay:amount,
>>> >> >> > > timestamp=1485129654025, value=4.99
>>> >> >> > >  1                                     column=pay:customer_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  1                                     column=pay:last_update,
>>> >> >> > > timestamp=1485129654025, value=2017-02-05 05:29:09.0
>>> >> >> > >  1                                     column=pay:payment_date,
>>> >> >> > > timestamp=1485129654025, value=2005-05-25 11:30:37.0
>>> >> >> > >  1                                     column=pay:staff_id,
>>> >> >> > > timestamp=1485129654025, value=1
>>> >> >> > >  10                                    column=pay:amount,
>>> >> >> > > timestamp=1485129504390, value=5.99
>>> >> >> > >  10                                    column=pay:customer_id,
>>> >> >> > > timestamp=1485129504390, value=1
>>> >> >> > >  10                                    column=pay:last_update,
>>> >> >> > > timestamp=1485129504390, value=2017-02-05 05:12:30.0
>>> >> >> > >  10                                    column=pay:payment_date,
>>> >> >> > > timestamp=1485129504390, value=2005-07-08 03:17:05.0
>>> >> >> > >  10                                    column=pay:rental_id,
>>> >> >> > > timestamp=1485129504390, value=126
>>> >> >> > >  10                                    column=pay:staff_id,
>>> >> >> > > timestamp=1485129504390, value=2
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Please let me know if anything required further.
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > Thanks,
>>> >> >> > > Jilani
>>> >> >> > >
>>> >> >> > > On Tue, Jan 31, 2017 at 3:38 AM, Erzsebet Szilagyi
<
>>> >> >> > > liz.szilagyi@cloudera.com> wrote:
>>> >> >> > >
>>> >> >> > >> Hi Jilani,
>>> >> >> > >> I'm not sure I completely understand what you
are trying to
>>> do.
>>> >> Could
>>> >> >> > you
>>> >> >> > >> give us some examples with e.g. 4 columns and
2 rows of
>>> example
>>> >> data
>>> >> >> > >> showing the changes that happen compared to the
changes you'd
>>> >> like to
>>> >> >> > see?
>>> >> >> > >> Thanks,
>>> >> >> > >> Liz
>>> >> >> > >>
>>> >> >> > >> On Tue, Jan 31, 2017 at 5:18 AM, Jilani Shaik
<
>>> >> jilani2423@gmail.com>
>>> >> >> > >> wrote:
>>> >> >> > >>
>>> >> >> > >> >
>>> >> >> > >> > Please help in resolving the issue, I am
going through
>>> source
>>> >> code
>>> >> >> > some
>>> >> >> > >> > how the required nature is missing, But
not sure is it for
>>> some
>>> >> >> reason
>>> >> >> > >> we
>>> >> >> > >> > avoided this nature.
>>> >> >> > >> >
>>> >> >> > >> > Provide me some suggestions how to go with
this scenario.
>>> >> >> > >> >
>>> >> >> > >> > Thanks,
>>> >> >> > >> > Jilani
>>> >> >> > >> >
>>> >> >> > >> > On Sun, Jan 22, 2017 at 6:45 PM, Jilani
Shaik <
>>> >> >> jilani2423@gmail.com>
>>> >> >> > >> > wrote:
>>> >> >> > >> >
>>> >> >> > >> >> Hi,
>>> >> >> > >> >>
>>> >> >> > >> >> We have a scenario where we are importing
data into HBase
>>> with
>>> >> >> sqoop
>>> >> >> > >> >> incremental import.
>>> >> >> > >> >>
>>> >> >> > >> >> Lets say we imported a table and later
source table got
>>> updated
>>> >> >> for
>>> >> >> > >> some
>>> >> >> > >> >> columns as null values for some rows.
Then while doing
>>> >> incremental
>>> >> >> > >> import
>>> >> >> > >> >> as per HBase these columns should not
be there in HBase
>>> table.
>>> >> But
>>> >> >> > >> right
>>> >> >> > >> >> now these columns will be as it is available
with previous
>>> >> values.
>>> >> >> > >> >>
>>> >> >> > >> >> Is there any fix to overcome this issue?
>>> >> >> > >> >>
>>> >> >> > >> >>
>>> >> >> > >> >> Thanks,
>>> >> >> > >> >> Jilani
>>> >> >> > >> >>
>>> >> >> > >> >
>>> >> >> > >> >
>>> >> >> > >>
>>> >> >> > >
>>> >> >> > >
>>> >> >> >
>>> >> >>
>>> >> >
>>> >> >
>>> >>
>>> >
>>> >
>>>
>>
>>
>

Mime
View raw message