nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: How to get a complete listing of flowfiles in a queue?
Date Tue, 14 Apr 2020 12:08:21 GMT
James

Using the provenance events from this processor is the best way.  Grab all
receive events for the time period of interest.

You can do this in a few ways but one that works well is to send prov
events via reporting task, filter events for that component, write those
out to a file or set of files and review.  I think we have an example of
this on our wiki.

Thanks

On Tue, Apr 14, 2020 at 7:57 AM James McMahon <jsmcmahon3@gmail.com> wrote:

> I have an issue with a ListFile processor. It does not appear to be
> consuming all the raw data files that show up throughout the day in a
> landing directory. My count at end of the day is less than the count of all
> the files in the directory at end of the day. I suspect it has to do with
> the way the ListFile has been configured (right now we only accept files
> that are 30 minutes old or older), or it has to do with the fact that large
> multiples of file can arrive at the same hh:mm differentiated by seconds or
> milliseconds.  Perhaps ListFile is recording its state only to the
> hour-minute or hour-minute-second (I notice that all millisecond values in
> the epoch time are 000 in View State), and so when ListFile runs in its
> following cycle it overlooks all the other files that share hh:mm, but are
> later in time by some seconds or milliseconds on the file time? I'm
> grasping for a logical cause at this point.
>
> I want to do a comparison of what I have read in so far today against an
> exhaustive list of today's directory. My intention is that such a
> comparison should flag gaps, which then may lead me to a cause.
>
> I have saved to a queue that persists the results of ListFile Success path
> for 24 hours, which I started after all files yesterday had stopped
> arriving (point being, queue will only have flowfiles in it from the today
> directory). Right now it totals 16,231 flowfiles. The "read only" directory
> on the linux system has nearly 20,000 files in it. Looking at the queue
> from the UI isn't quite what I require: it only lets me view 100 flowfiles,
> and I can't output the list.
>
> Can I use the API or other option to generate the complete list of
> flowfiles in that queue? I hope to output a list that includes Filename,
> file.lastModifiedTime, and file.creationTime .
> Thank you in advance for your help.
>
>
>

Mime
View raw message