nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: NiFi 1.8.0 LoadBalance Strategy Issue for Connection between Funnel and FetchSFTP
Date Thu, 08 Nov 2018 20:38:02 GMT
Hi Josef,

The prioritizers provide a weak ordering to the data, not an absolute sorting. What I mean
by that is that
if you are prioritizing a FlowFile with attribute A = 123 over a FlowFIle with attribute A
= 125, then the first
one will likely go first but it's not guaranteed. For example, when you have Load Balanced
connections,
that Connection between your Funnel and FetchSFTP actually consists of 8 different queues:
one for each
node in your cluster. Within each of those queues, the FlowFiles in the queue are prioritized
according to
your configured Prioritizers. So you're not guaranteed to process everything sequentially
according to the
Prioritizer. Data that is swapped out can also change the 'absolute ordering' of FlowFiles.

Now, that being said, you should get a 'rough ordering' close to what you would expect. The
way that you
have this shown here, though, I think is that only the Connection between the funnel and FetchSFTP
is
using Prioritizers. This means that it will sort the data that it has according to your Prioritizer
- but the Funnel
is feeding in the data from its Connections and those are not Prioritized. So you'll want
to ensure that
the Connections between UpdateAttribute and the Funnel are also configured with Prioritizers.

Sorry for the wordiness. Hopefully this makes sense. If not, please let us know.

Thanks
-Mark



On Nov 8, 2018, at 2:55 AM, <Josef.Zahner1@swisscom.com<mailto:Josef.Zahner1@swisscom.com>>
<Josef.Zahner1@swisscom.com<mailto:Josef.Zahner1@swisscom.com>> wrote:

Hi guys

We have a 8 cluster nifi cluster and do a listSFTP on the primary node. After the ListSFTP
we add some attributes and send it over a funnel to the FetchSFTP. On the connection between
the funnel and the FetchSFTP we have an “Object Threshold” of 100,some “Prioritizer”
and round robin loadbalancing to get the files in a sorted order. Right after start we had
about 800 files (expected value due to 8 nodes) in the queue between the funnel and the FetchSFTP,
but after a few hours (we get about 200k-250k files from each ListSFTP processors) the number
of files decreased to the number below. However, it seems that all nodes gets load, because
after the FetchSFTP we see a more or less even distributed load.
Next Issue or maybe misunderstanding is, that we would like to have all the listSFTP files
in a sorded order from the four folders. So we added the priority attribute where we assign
as value epoch in seconds extracted from filename. However, it seems that there is no human
understandable logic how the files get sorted in the queue between the funnel and the FetchSFTP,
because after a few hours I see files with nearly the oldest and the newest possible timestamp
in our DB (which shouldn’t be possible as we have the priority attribute with epoch time.
Is the a failure in our logic how nifi works here? Should we remove the funnel and connect
the UpdateAttribute processor directly to the FetchSFTP? Or how can we overcome the order
issue?

Thanks in advance,
Josef


<image001.png>


<image002.png>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message