nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Etienne Jouvin <>
Subject Re: Merge content Defrag with high activity
Date Tue, 22 Oct 2019 15:20:45 GMT
Thanks for the answer.

When I read the source code, I saw ow it is complexe and powerful.

In fact, i need multiples tasks, and I may loop on the "same" file.
To summarize, it is something like :

Object as X version.
For first version :
* Fork the object to one branch in order to do an external call and get the
* Fork the object to do "nothing" then merge the result from first fork
into it
Increase the version number and go do it again.
And so loop to all versions.

And for "performance", I try to do this on 50 concurrent objects.
In most case, it works during the merge. But sometimes... the bin was fired
as "ready" but did not reached the expected content.

Anyway, as you said, I had to work on all parameters.
I change the fragment identifier in order to have the version number, and
not only a value that is commun to all versions.

I also tried to set current task, for the merge to 1 and set a Run Schedule
to 0.1
But it was "pretty slow", almost not as fast as I expected.

But now, working with a new identifier, and set maximum bean number relly
greater than expected possible (6x more) I works as expected.

I will survey anyway.


Etienne Jouvin

Le mar. 22 oct. 2019 à 16:59, Joe Witt <> a écrit :

> Hello
> You should only have 1 or a few tasks at most for this processor.
> Scheduling can be frequent but choosing different options and seeing for
> your case is best.
> This processor is relatively difficult to configure correctly as it is a
> complex case and has powerful options.  What you will need to watch out for
> is the maximum number of bins it can track at once.  If each bin is to hold
> at least and at most 2 things and lots of data is arriving then what you
> need are lots of bins so focus on that setting.
> Thanks
> On Tue, Oct 22, 2019 at 10:49 AM Etienne Jouvin <>
> wrote:
>> Hi,
>> Here is the case.
>> High activity, and use a MergeContent action.
>> I setup the mergeContent with 300 concurrent action and no schedule,
>> meaning Run Schedule set to 0.
>> Minimum Number of Entries : 2
>> Maximum Number of Entries : 2
>> No limit on the size.
>> In some case, I reach exception :
>> because the expected number of fragments is 2 but found only 1 fragments
>> What I believe is that I am reaching side effet.
>> May be, I have multiple execution at the same time, and some bins are
>> considered as fulled and returned to be proceeded. But when returned, the
>> object does not contains all expected flowfiles and during the execution,
>> function processBins in class BinFiles, I am reaching the exception.
>> It seems that I manage to skip this error when setting concurrent task to
>> 1.
>> But, it slow down a little the process.
>> Should I keep 300 concurrent tasks, and set some schedule, something like
>> 0.1 second ?
>> Regards
>> Etienne Jouvin

View raw message