nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitaly Krivoy <Vitaly_Kri...@jhancock.com>
Subject RE: CaptureChangeMySQL and Triggers
Date Mon, 21 Jan 2019 15:45:41 GMT
It does. Thank you Boris.

From: Boris Tyukin <boris@boristyukin.com>
Sent: Friday, January 18, 2019 1:10 PM
To: users@nifi.apache.org
Subject: Re: CaptureChangeMySQL and Triggers

Hi Vitaly,

I do not work for a CDC vendor but there are many reasons why CDC tools exist and no NiFi
processor will ever beat the benefits of these commercial tools. If timestamps are reliable,
extracts are fast and you can handle inserts, updates, delete and primary keys updates without
impacting the performance of your source database - great!

Unfortunately, in my tenure, this is a rare case rather than exceptions. Vendors design extremely
poor databases, developers and DBAs delete or change data, without updating timestamps. One
of the projects I worked on recently, did not even have timestamps and tables were very large
(over 1B rows).

CDC tools allow mining database log efficiently and reliably. Another use case for CDC tools
is online replication (either for data redundancy, backups or offloading reporting queries
from production system).

It just seems to me you have not encountered a project/use case when such a tool is a must.

Here is example for you...We recently purchased GoldenGate that can stream data into Kafka.
While we had timestamps in the source database, they were unreliable for many reasons including
people. Another reason, our source systems are beefy Oracle RAC clusters which are under extreme
load 24x7. Lots of analysts and dev used it for reporting purposes which impact performance
greatly for people who need to perform their duties.

There is also a lot of complexity then it comes to mining database logs. First, these things
are platform/vendor dependent. Second, serious commercial RDMBS like Oracle, have tons of
settings and deployment options, log file rotation rules, backups, clustering, load-balancing
you name it.

CDC tool was our option to solve that issue with Oracle and that specific system. At the same
time, I worked with another vendor system, when we could rely on timestamps and they would
never delete data.

Hope it sheds some light a bit on "change data capture logs marketed advantages" ;)


On Fri, Jan 18, 2019 at 12:07 PM Vitaly Krivoy <Vitaly_Krivoy@jhancock.com<mailto:Vitaly_Krivoy@jhancock.com>>
wrote:
This is a follow-on question to Apache/HortonWorks, rather than an answer to the question
posted by Marcelo. Outside of CaptureChangeMySQL, are there plans underway to add similar
processors for other databases? I realize that a database would have to produce a capture
data change log for this feature to be implemented inexpensively. One of the objections that
I that I constantly have to face in my organization to NiFi adaptation is that ExecuteSQL
would require polling thus affecting performance of the source DBMS system. I realize that
this objection is silly and if modification/creation timestamp column in the table is indexed,
selecting the rows that have been added/modified after last run date in ExecuteSQL would barely
affect the server. But as a consulting architect, I have to deal with non-technical clients
who make their decisions based on buzz words and they have heard of change data capture logs
marketed advantages. Thanks.

From: Marcelo Terres <mhterres@gmail.com<mailto:mhterres@gmail.com>>
Sent: Thursday, January 17, 2019 5:51 AM
To: users@nifi.apache.org<mailto:users@nifi.apache.org>
Subject: CaptureChangeMySQL and Triggers

Hello.

Is someone here using CaptureChangeMySQL processor to get data from a table which data is
generated and managed by triggers/stored procedures?

I'm having some weird issues as data not being processed in case of specific inserts and also
some weird data being generated in case of simple operations (3 objects in one update operation,
for example).

Thanks in advance,

Regards,

Marcelo H. Terres
<mhterres@gmail.com<mailto:mhterres@gmail.com>>
https://www.mundoopensource.com.br
https://twitter.com/mhterres
https://linkedin.com/in/marceloterres

STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments
may be confidential and legally privileged and is intended for the use of the addressee(s)
only. If you are not an intended recipient, please: (1) notify me immediately by replying
to this message; (2) do not use, disseminate, distribute or reproduce any part of the message
or any attachment; and (3) destroy all copies of this message and any attachments.

STATEMENT OF CONFIDENTIALITY The information contained in this email message and any attachments
may be confidential and legally privileged and is intended for the use of the addressee(s)
only. If you are not an intended recipient, please: (1) notify me immediately by replying
to this message; (2) do not use, disseminate, distribute or reproduce any part of the message
or any attachment; and (3) destroy all copies of this message and any attachments.
Mime
View raw message