nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Tyukin <bo...@boristyukin.com>
Subject Re: CaptureChangeMySQL and Triggers
Date Fri, 18 Jan 2019 18:09:49 GMT
Hi Vitaly,

I do not work for a CDC vendor but there are many reasons why CDC tools
exist and no NiFi processor will ever beat the benefits of these commercial
tools. If timestamps are reliable, extracts are fast and you can handle
inserts, updates, delete and primary keys updates without impacting the
performance of your source database - great!

Unfortunately, in my tenure, this is a rare case rather than exceptions.
Vendors design extremely poor databases, developers and DBAs delete or
change data, without updating timestamps. One of the projects I worked on
recently, did not even have timestamps and tables were very large (over 1B
rows).

CDC tools allow mining database log efficiently and reliably. Another use
case for CDC tools is online replication (either for data redundancy,
backups or offloading reporting queries from production system).

It just seems to me you have not encountered a project/use case when such a
tool is a must.

Here is example for you...We recently purchased GoldenGate that can stream
data into Kafka. While we had timestamps in the source database, they were
unreliable for many reasons including people. Another reason, our source
systems are beefy Oracle RAC clusters which are under extreme load 24x7.
Lots of analysts and dev used it for reporting purposes which impact
performance greatly for people who need to perform their duties.

There is also a lot of complexity then it comes to mining database logs.
First, these things are platform/vendor dependent. Second, serious
commercial RDMBS like Oracle, have tons of settings and deployment options,
log file rotation rules, backups, clustering, load-balancing you name it.

CDC tool was our option to solve that issue with Oracle and that specific
system. At the same time, I worked with another vendor system, when we
could rely on timestamps and they would never delete data.

Hope it sheds some light a bit on "change data capture logs marketed
advantages" ;)


On Fri, Jan 18, 2019 at 12:07 PM Vitaly Krivoy <Vitaly_Krivoy@jhancock.com>
wrote:

> This is a follow-on question to Apache/HortonWorks, rather than an answer
> to the question posted by Marcelo. Outside of CaptureChangeMySQL, are there
> plans underway to add similar processors for other databases? I realize
> that a database would have to produce a capture data change log for this
> feature to be implemented inexpensively. One of the objections that I that
> I constantly have to face in my organization to NiFi adaptation is that
> ExecuteSQL would require polling thus affecting performance of the source
> DBMS system. I realize that this objection is silly and if
> modification/creation timestamp column in the table is indexed, selecting
> the rows that have been added/modified after last run date in ExecuteSQL
> would barely affect the server. But as a consulting architect, I have to
> deal with non-technical clients who make their decisions based on buzz
> words and they have heard of change data capture logs marketed advantages.
> Thanks.
>
>
>
> *From:* Marcelo Terres <mhterres@gmail.com>
> *Sent:* Thursday, January 17, 2019 5:51 AM
> *To:* users@nifi.apache.org
> *Subject:* CaptureChangeMySQL and Triggers
>
>
>
> Hello.
>
>
>
> Is someone here using CaptureChangeMySQL processor to get data from a
> table which data is generated and managed by triggers/stored procedures?
>
>
>
> I'm having some weird issues as data not being processed in case of
> specific inserts and also some weird data being generated in case of simple
> operations (3 objects in one update operation, for example).
>
>
>
> Thanks in advance,
>
>
>
> Regards,
>
>
>
> Marcelo H. Terres
>
> <mhterres@gmail.com>
> https://www.mundoopensource.com.br
> https://twitter.com/mhterres
> https://linkedin.com/in/marceloterres
>
>
>
> STATEMENT OF CONFIDENTIALITY The information contained in this email
> message and any attachments may be confidential and legally privileged and
> is intended for the use of the addressee(s) only. If you are not an
> intended recipient, please: (1) notify me immediately by replying to this
> message; (2) do not use, disseminate, distribute or reproduce any part of
> the message or any attachment; and (3) destroy all copies of this message
> and any attachments.
>

Mime
View raw message