nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: MergeRecord performance
Date Mon, 27 Apr 2020 13:25:48 GMT
Robert,

What kind of performance degradation were you seeing here? I put together some simple flows
to see if I could reproduce using 1.9.2 and current master.
My flow consisted of GenerateFlowFile (generating 2 CSV rows per FlowFile) -> ConvertRecord
(to Avro) -> MergeRecord (read Avro, write Avro) -> UpdateAttribute to try to mimic
what you’ve got, given the details that I have.

I did see a performance degradation on the order of about 10%. So on my laptop I went from
processing 2.49 MM FlowFiles in 1.9.2 in 5 mins to 2.25 MM on the master branch. Interestingly,
I saw no real change when I enabled Snappy compression.

For a point of reference, I also tried removing MergeRecord and just Generate -> Convert
-> UpdateAttribute. I saw the same roughly 10% performance degradation.

I’m curious if you’re seeing more than that. If so, I think a template would be helpful
to understand what’s different.

Thanks
-Mark


On Apr 24, 2020, at 4:50 PM, Robert R. Bruno <rbruno8@gmail.com<mailto:rbruno8@gmail.com>>
wrote:

Joe,

In that part of the flow, we are using avro readers and writers.  We are using snappy compression
(which could be part of the problem).  Since we are using avro at that point the embedded
schema is being used by the reader and the writer is using the schema name property along
with an internal schema registry in nifi.

I can see what could potentially be shared.

Thanks

On Fri, Apr 24, 2020 at 4:41 PM Joe Witt <joe.witt@gmail.com<mailto:joe.witt@gmail.com>>
wrote:
Robert,

Can you please detail the record readers and writers involved and how schemas are accessed?
 There can be very important performance related changes in the parsers/serializers of the
given formats.  And we've added a lot to make schema caching really capable but you have to
opt into it.  It is of course possible MergeRecord itself is the culprit for performance reduction
but lets get a more full picture here.

Are you able to share a template and sample data which we can use to replicate?

Thanks

On Fri, Apr 24, 2020 at 4:38 PM Robert R. Bruno <rbruno8@gmail.com<mailto:rbruno8@gmail.com>>
wrote:
I wanted to see if anyone else has experienced performance issues with the newest version
of nifi and MergeRecord?  We have been running on nifi 1.9.2 for awhile now, and recently
upgraded to nifi 1.11.4.  Once upgraded, our identical flows were no longer able to keep up
with our data mainly at MergeRecord processors.

We ended up downgrading back to nifi 1.9.2.  Once we downgraded, all was keeping up again.
 There were no errors to speak of when we were running the flow with 1.11.4.  We did see higher
load on the OS, but this may have been caused by the fact there was such a tremendous backlog
built up in the flow.

Another side note, we saw one UpdateRecord processor producing errors when I tested the flow
with nifi 1.11.4 with a small test flow.  I was able to fix this issue by changing some parameters
in my RecordWriter.  So perhaps some underlying ways records are being handled since 1.9.2
caused the performance issue we saw?

Any insight anyone has would be greatly appreciated, as we very much would like to upgrade
to nifi 1.11.4.  One thought was switching the MergeRecord processors to MergeContent since
I've been told MergeContent seems to perform better, but not sure if this is actually true.
 We are using the pattern of chaining a few MergeRecord processors together to help with performance.

Thanks in advance!

Mime
View raw message