nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Flowers <>
Subject Re: Interactive Queue management
Date Fri, 09 Oct 2015 14:22:13 GMT
Can we also get the ability to clear the bulletins from a processor on
restart? Sometimes it would be easier to troubleshoot a processor if
you could clear the bulletins.

Sent from my iPhone

> On Oct 9, 2015, at 10:16 AM, Matt Gilman <> wrote:
> Joe,
> We've been receiving a lot of feedback lately regarding the Purge/Clear
> Queue capability. Because of this we'd like to introduce that feature into
> the 0.4.0 release with the Viewing and Removing of individual FlowFiles
> into a subsequent release. The Purge/Clear Queue capability is a small
> portion of the full Queue Management feature that we can introduce
> independently and quickly.
> I looked through your NiFi Fork specifically at the Purge/Clear
> functionality. There are some additional considerations that I mentioned
> before specifically around FlowFile swapping and submitting the Purge/Clear
> request asynchronously that we need to account for. We have created a
> branch (NIFI-730) for doing this work. We'd love for you to work with us on
> this aspect if you are interested.
> Thanks!
> Matt
>> On Fri, Oct 2, 2015 at 3:16 PM, Matt Gilman <> wrote:
>> Joe,
>> Yes, as Mark mentioned it is definitely awesome that your interested in
>> digging in here. Most of the discussions regarding this feature have been
>> really high level at this point. So we're happy to work through some of the
>> details as Mark has begun. A couple points that come to mind right now.
>> - I don't think we want to support manually prioritization. The
>> connections can be configured with prioritizers and we'd like to use those
>> to manage the ordering of the enqueued FlowFiles. However, the listing of
>> FlowFiles will be rendered by their priority by default though will likely
>> support sorting by any of the fields.
>> - The current thought process is that we'll want to require source and
>> destination components to be stopped. This is inline with the existing
>> functionality throughout the application.
>> - The number of enqueued FlowFiles is technically unbounded. Because of
>> this the endpoint may be need to support some sort of pagination since we'd
>> may not want/be able to return the entire queue in a single response. There
>> is some concern about Java heap since many of the flowfiles may be swapped
>> out to disk. Additionally there are some concerns about HTTP response size
>> and the amount of data we store client side.
>> - What we do with the FlowFiles that are swapped out to disk is still
>> undecided. Not sure whether we want to load them from disk in order to
>> include them in the response or if we just show that X number of FlowFiles
>> are currently swapped out.
>> Some of these items will need to be hashed out but we're happy to work
>> through them with you. We should keep the Feature Proposal up to date as
>> well [1].
>> Thanks!
>> Matt
>> [1]
>>> On Thu, Oct 1, 2015 at 12:34 PM, Mark Payne <> wrote:
>>> Joe,
>>> First of all, it is awesome that you're interested in jumping on this!
>>> And I think you're off to a great
>>> start and have a really good understanding of exactly where we all want
>>> to go with this.
>>> I'm sure there will be a lot of questions that will come up in working
>>> through a lot of the
>>> stuff here. Just from reading through the email here i have a couple of
>>> comments/thoughts that
>>> may help to shape the way forward. This is a bit of a stream of
>>> consciousness, so I hope all
>>> makes sense :)
>>> The connectionQueueItem model that you lay out here, I think is really
>>> just a FlowFile.
>>> I think it will make sense to just use the name flowFile.
>>> When you bring up the contents of a queue in the UI, I would imagine that
>>> it would be shown
>>> as something similar to the Data Provenance table. From there I'd want to
>>> click on the FlowFile
>>> in the table to see more details. So I'm envisioning two separate data
>>> models really. The first
>>> would be maybe a FlowFileSummary. It would look very similar to what
>>> you've laid out below,
>>> but perhaps contain information about how long the FlowFile has been
>>> queued up, perhaps
>>> how many times it has been re-queued on this particular queue (for
>>> example, if a FlowFile keeps
>>> failing to process, we could use this information to remove that
>>> particular FlowFile from the
>>> queue, etc.)
>>> When we get more info for the FlowFile, I would expect it to contain all
>>> FlowFile Attributes. This
>>> I think is a different data model because if we pull back all attributes
>>> for every FlowFile when
>>> we render the table, the amount of data brought back could be huge.
>>> Another consideration here, is that when a connection has a lot of
>>> FlowFiles on it, the framework
>>> may swap those FlowFiles out to disk in order to remove them from the
>>> Java heap. We will
>>> want to ensure that we include info about how much is in the queue (# of
>>> FlowFiles and size of those
>>> FlowFiles), how much is swapped out (# of FlowFiles + size), and how much
>>> is currently being
>>> processed by Processors (in the FlowFileQueue this is referenced as
>>> Unacknowledged FlowFiles).
>>> In the UI table, we should also make sure that by default we are showing
>>> the FlowFiles in the order
>>> in which they exist in the queue right now.
>>> From a RESTful perspective, we may want to also consider that in order to
>>> purge a queue, we are not
>>> really deleting the queue itself, but rather its contents. So perhaps we
>>> should use a URI like
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/contents
>>> <
>>> http://your-host/nifi-api/controller/process-groups/%7Bprocess-group-id%7D/connections/%7Bconnection-id%7D/queue/contents
>>> but I'll be the first to admit that REST is not really my forte. So if
>>> that doesn't make sense then ignore that.
>>> Very excited to see you jumping in here!
>>> Thanks
>>> -Mark
>>>>> On Oct 1, 2015, at 12:13 PM, József Mészáros <
>>>>> wrote:
>>>> Hey NiFi experts :-)
>>>> I have started to work on the backend part of interactive queue
>>> management,
>>>> which has several related issues: NIFI-99 (Review in flight flow file
>>>> details) <>,NIFI-108
>>>> <>,NIFI-730 (Purge queue
>>> from
>>>> UI) <>,NIFI-139
>>> (Distribution
>>>> of FlowFiles on a connection)
>>>> <>. There is a feature
>>>> proposal description [1], which helps you to get a quick overview.
>>>> I hope I made the first step to move forward with this topic in a
>>>> "standard" and "good" direction. I tried to think generally, and making
>>> the
>>>> changes considering all the requested improvements from backend
>>>> perspective. The basic idea was to extend the web-api with a new
>>> endpoint
>>>> for managing a connection queue (used e.g. by the UI). Based on the
>>>> mentioned issues, I created the following new "methods":
>>>>  - Get the content of the connection queue
>>>> *GET*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>>>> Response: List of connection queue items
>>>>  - Clear (purge) the connection queue
>>>> *DELETE*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue
>>>>  - Remove a single item from the connection queue
>>>> *DELETE*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>>>>  - Get a single item from the connection queue
>>>> *GET*
>>> http://your-host/nifi-api/controller/process-groups/{process-group-id}/connections/{connection-id}/queue/{flow-file-uuid}
>>>> Response: Single connection queue item
>>>> The connection queue item looks like this (JSON):
>>>> "connectionQueueItem": {
>>>>       "flowFileId": 16,
>>>>       "flowFileUuid": "92c74b41-005e-444d-8f9e-f9cbc01af5f2",
>>>>       "fileSize": "42 bytes",
>>>>       "fileSizeBytes": 42,
>>>>       "fileName": "filename.tsv",
>>>>       "entryDate": 1443546148763,
>>>>       "lineageStartDate": 1443546148763,
>>>>       "contentClaimSection": "1",
>>>>       "contentClaimContainer": "default",
>>>>       "contentClaimIdentifier": "1443546148763-1",
>>>>       "contentClaimOffset": 0
>>>>   }
>>>> If the flow file has a priority attribute, it is also included as a
>>> numeric
>>>> value.
>>>> It contains enough information for "View content" and "Download content"
>>>> panels and from the frontend perspective, it could be implemented in a
>>>> similar way. If you would like to purge the connection queue, or a
>>> single
>>>> item, you just have to make an HTTP DELETE request. And off course your
>>> are
>>>> able to make statistics and review the content of the queue with the
>>> first
>>>> method. Updating an item (e.g. reorder the queue) can result a new
>>> method,
>>>> or covered by a delete + put for a single item with a new priority
>>> value.
>>>> You can find my commits in the following repo :
>>>> in branch NIFI-108
>>>> <>. I do not want to
>>> make a
>>>> pull request until the backend is not in a mergeable state.
>>>> It is important to mention, that the backend code is not complete, and
>>> at
>>>> some points it maybe requires some reshaping, but you can see the basic
>>>> concept, and the direction. Before making any new commits, I wanted to
>>>> share the current state with you, and start/initiate a conversation
>>> about
>>>> the topic, and to get feedback, whether is it a good contribution, or
>>> not.
>>>> So guys, the questions are open :-)
>>>> Regards,
>>>> Joe
>>>> [1]

View raw message