sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wenxing zheng <wenxing.zh...@gmail.com>
Subject Re: "change data capture" or CDC support in the MySQL connector
Date Wed, 14 Dec 2016 09:32:02 GMT
Thanks to Boglarka. It seemed Sqoop2 is quite different from Sqoop 1.

On Sat, Dec 10, 2016 at 1:11 AM, Boglarka Egyed <bogi@cloudera.com> wrote:

> *About Sqoop2:*
> There are some incremental import related features implemented in Sqoop2,
> but I have no experience with it unfortunately.
> A good starting point could be for example to check the configuration of
> it here: https://github.com/apache/sqoop/blob/sqoop2/connector/connec
> tor-generic-jdbc/src/main/java/org/apache/sqoop/connecto
> r/jdbc/configuration/IncrementalRead.java
>
> *About using Sqoop1 incremental import on Hive tables:*
> Sqoop1 does not support incremental import with the --hive-import option
> because the Hive import functionality is not able to merge data.
>
> However, there is a workaround for this issue:
> * first, all the data should be imported into Hive tables from the DB
> * then changed data should be imported with incremental import from the DB
> by setting the --target-dir option to the directory holding the Hive table
> data and we have to specify the field delimiter, line delimiter and escape
> character used by Hive
>
> An example:
> * first:
> sqoop import --connect <conn> --username <uname> --password <pwd> --table
> <table_name> --num-mappers 1 --hive-import --hive-table <table_name>
>
> This command will create the Hive table and load all the data from the DB.
>
> * then:
> sqoop import --connect <conn> --username <uname> --password <pwd> --table
> <table_name> --incremental lastmodified --check-column <col> --merge-key
id
> --num-mappers 1 --target-dir /user/hive/warehouse/<table_name>
> --last-value <last_value> --fields-terminated-by <field_del>
> --lines-terminated-by <lines_del> --escaped-by <esc>
>
> When the second command runs it queries records from DB which were
> modified after <last_value>, it puts them in a temporary directory then
> merges this result with the already imported data. The result will be put
> into the target directory which is the directory provided to Hive so the
> next time we query the <table_name> table in Hive we will see the merged
> data.
>
> *About handling DELETION:*
> Sqoop incremental import doesn't handle deletion on the source, it
> supports two types: append (for new rows) and lastmodified (for alternate
> table update).
>
> On Thu, Dec 8, 2016 at 2:56 AM, wenxing zheng <wenxing.zheng@gmail.com>
> wrote:
>
>> And another important case is how the incremental imports can cope with
>> the "DELETION" on the source.
>>
>> Thanks again,
>>
>> On Thu, Dec 8, 2016 at 9:42 AM, Buntu Dev <buntudev@gmail.com> wrote:
>>
>>> I'm interested in this as well and would like to sqoop to a hadoop
>>> cluster as a Hive table. How does one handle the updates on the table using
>>> the incremental imports?
>>>
>>> On Wed, Dec 7, 2016 at 5:29 PM, wenxing zheng <wenxing.zheng@gmail.com>
>>> wrote:
>>>
>>>> Thanks to Bogi. Is it available in the Sqoop2?
>>>>
>>>> On Thu, Dec 8, 2016 at 4:16 AM, Boglarka Egyed <bogi@cloudera.com>
>>>> wrote:
>>>>
>>>>> Hi Wenxing,
>>>>>
>>>>> Yes, Sqoop is capable to support CDC, it can be performed by
>>>>> "incremental" imports.
>>>>>
>>>>> Please see the user guide for details: https://sqoop.apache.
>>>>> org/docs/1.4.6/SqoopUserGuide.html#_incremental_imports
>>>>>
>>>>> Cheers,
>>>>> Bogi
>>>>>
>>>>> On Wed, Dec 7, 2016 at 12:21 PM, wenxing zheng <
>>>>> wenxing.zheng@gmail.com> wrote:
>>>>>
>>>>>> Dear all,
>>>>>>
>>>>>> I am new to the Sqoop, and would like to know whether the
>>>>>> architecture of Sqoop is able to support "change data capture"? or
can I
>>>>>> say that the "Generic JDBC Connector" is only for bulk data transferring?
>>>>>>
>>>>>> Appreciated for any hints
>>>>>>
>>>>>> Thanks, Wenxing
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message