I have an issue with a ListFile processor. It does not appear to be consuming all the raw data files that show up throughout the day in a landing directory. My count at end of the day is less than the count of all the files in the directory at end of the day. I suspect it has to do with the way the ListFile has been configured (right now we only accept files that are 30 minutes old or older), or it has to do with the fact that large multiples of file can arrive at the same hh:mm differentiated by seconds or milliseconds. Perhaps ListFile is recording its state only to the hour-minute or hour-minute-second (I notice that all millisecond values in the epoch time are 000 in View State), and so when ListFile runs in its following cycle it overlooks all the other files that share hh:mm, but are later in time by some seconds or milliseconds on the file time? I'm grasping for a logical cause at this point.
I want to do a comparison of what I have read in so far today against an exhaustive list of today's directory. My intention is that such a comparison should flag gaps, which then may lead me to a cause.
I have saved to a queue that persists the results of ListFile Success path for 24 hours, which I started after all files yesterday had stopped arriving (point being, queue will only have flowfiles in it from the today directory). Right now it totals 16,231 flowfiles. The "read only" directory on the linux system has nearly 20,000 files in it. Looking at the queue from the UI isn't quite what I require: it only lets me view 100 flowfiles, and I can't output the list.
Can I use the API or other option to generate the complete list of flowfiles in that queue? I hope to output a list that includes Filename, file.lastModifiedTime, and file.creationTime .
Thank you in advance for your help.