nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <>
Subject Re: PutDataBaseRecord performance issues.
Date Tue, 15 Oct 2019 15:21:35 GMT

What are your MergeRecord settings? Unfortunately since
CaptureChangeMySQL was developed before the Record API was available,
the processor emits one flow file per event. This can certainly cause
performance issues at PutDatabaseRecord. This may be the first time
I've suggested this :) but you may find you're better off with
ConvertJsonToSQL -> PutSQL and setting the "Batch Size" property of
PutSQL to a high number. This should effectively do a "merge"
(actually it grabs as many flow files as it can up to Batch Size for
one execution) and execute the SQL statement(s).

It seems like a CaptureChangeMySQLRecord processor would be a good
idea, or perhaps adding "Max Records Per FlowFile" and an optional
"Record Writer" property to the existing CaptureChangeMySQL processor,
so it can write multiple records per flowfile, sparing the need for a
Merge processor or explicit conversion to SQL. I presume the choice
between a new processor or augmenting the existing one would depend on
whether there's a common schema for all events. Please feel free to
write an Improvement Jira to cover this.


On Tue, Oct 15, 2019 at 7:58 AM
<> wrote:
> I am using CaptureChangeMySQL to extract bin log and do some transformations and then
write to another database using  PutDataBaseRecord.  Now the PutDataBaseRecord  processor
is a performance bottleneck
> If i set the PutDataBaseRecord  processor concurrency lager than 1,  there will be ordering
issues. The ordering the binlog to the destination database will not be the same as they comming.
But with  one concurrency, the TPS is only about 80/s
> Even I add a MergeRecord before PutDataBaseRecord, the TPS is no more than 300
> Anybody have any idea about this?
> Thanks,
> Lei
> ________________________________

View raw message