nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kessler, Jon" <Kessl...@varentech.com>
Subject [Discuss] Data prioritization - A proposed solution
Date Thu, 17 Oct 2019 11:24:57 GMT
I want to start a discussion about a new prioritization mechanism that addresses some of the
issues that I believe exist in the current solution. These issues are:

 - Scheduling: No consideration is given to data priority when determining which component
is given the next available thread with which to work
 - Constant sorting: Because all flowfiles in a given connection share the same PriorityQueue
they must be sorted every time they move. While this sort is efficient it can add up as queues
grow deep.
 - Administration: There is a costly human element to managing the value used as a priority
ranking as priorities change. You must also ensure every connection in the appropriate flow
has the proper prioritizer assigned to it to make use of the property.

We have developed a prototype of a new FlowFileQueue implementation that addresses these issues.
Use of this implementation is controlled via nifi.properties so you can opt-in or out system-wide
without doing a lot of configuration. Its design goals are:

  - Instead of using the value of a FlowFile attribute as a ranking, maintain a set of expression
language rules to define your priorities. The highest ranked rule that a given FlowFile satisfies
will be that FlowFile's priority
  - Because we have a finite set of priority rules we can utilize a bucket sort in our connections.
One bucket per priority rule. The bucket/rule with which a FlowFile is associated with will
be maintained so that as it moves through the system we do not have to re-evaluate that Flowfile
against our ruleset unless we have reason to do so.
  - Control where in your flow FlowFiles are evaluated against the ruleset with a new Prioritizer
implementation: BucketPrioritizer.
  - When this queue implementation is polled it will be able to check state to see if any
data of a higher priority than what it currently contains recently (within 5s) moved elsewhere
in the system. If higher priority data has recently moved elsewhere, the connection will only
provide a FlowFile X% of the time where X is defined along with the rule. This allows higher
priority data to have more frequent access to threads without thread-starving lower priority
data.
  - Rules will be managed via a menu option for the flow and changes to them take effect instantly.
This allows you to change your priorities without stopping/editing/restarting various components
on the graph.

I intend to contribute this solution but first want to solicit input and opinions.

  - Jon Kessler


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message