nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <>
Subject Re: MergeRecord, queue & backpressure
Date Fri, 13 Apr 2018 14:49:47 GMT

In that case you're looking to merge about 500,000 FlowFiles into a single FlowFile, so you'll
definitely want to use a cascading approach. I'd shoot for about 1 MB for the first MergeRecord
and then merge 128 of those together for the second MergeRecord.

The provenance backpressure is occurring because of the large number of provenance events
generated. One even will be generated, more or less, for each time that a Processor touches
a FlowFile.
So if you are merging the FlowFiles together as early as possible, you'll reduce the load
that you're putting
on the Provenance Repository.

Also, depending on how you're getting the data into your flow, if you're able, it is best
to receive a larger "micro-batch"
of records per flowfile to begin with and not split them up. This would greatly alleviate
the pressure on the Provenance
Repository and avoid needing multiple MergeRecord processors as well.

Also, of note, there is a newer version of the Provenance Repository that you can switch to,
by changing the
"nifi.provenance.repository.implementation" property in from "org.apache.nifi.provenance.PersistentProvenanceRepository"
to "org.apache.nifi.provenance.WriteAheadProvenanceRepository". The Write-Ahead version is
quite a bit faster
and behaves differently than the Persistent Provenance Repo, so you won't see those warnings
about provenance

I hope this helps!

> On Apr 13, 2018, at 10:30 AM, DEHAY Aurelien <> wrote:
> Hello.
> It's me again regarding my mergerecord question.
> I still don't manage to have what I want, I may have understand how bin based processor
works, it's for clarification and a question regarding performance.
> I want to merge a huge number of 300 octets flowfiles in 128 MB parquet file. 
> My understanding is, for mergerecord to be able to create a bin with 128MB of data, these
data must be in queue. We can't feed the bin "one flow at a time", so working with small flowfiles,
I have to set the backpressure parameter to something really high, or remove completely the
number of flowfile backpressure limit.
> I understood by reading
that it's not the "good" way to do, but I should cascade multiple merge to "slowly" make the
flowfile bigger?
> I've made some test with a single level but I hit the "provenance recording rate". Will
multiple level help?
> Thanks for any help.
> Aurélien.
> This electronic transmission (and any attachments thereto) is intended solely for the
use of the addressee(s). It may contain confidential or legally privileged information. If
you are not the intended recipient of this message, you must delete it immediately and notify
the sender. Any unauthorized use or disclosure of this message is strictly prohibited.  Faurecia
does not guarantee the integrity of this transmission and shall therefore never be liable
if the message is altered or falsified nor for any virus, interception or damage to your system.

View raw message