nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: TailFile: Wildcards and filenames
Date Sun, 15 Nov 2015 14:54:22 GMT
Andre,

I love that you're digging in here and making sure that the quality is here! Thank you.

So in your example here, you said that you have Rolling Filename Pattern set to: *

So that should match any file in the directory (except that it won't count the actual file
being tailed).
From the description you gave, it sounds as if you are expecting * to match anything starting
with test1.
I.e., test1*. Instead, it will matching anything in the directory.

So with that in mind, I do believe that what you are seeing is the expected behavior, as you
are really
hitting some corner cases here.

In order to understand how this is functioning, we need to consider how some corner cases
are handled.
First, the Processor doesn't scan for files that have rolled over each time it runs. It scans
only when the
File to Tail has rolled over (i.e., when that file has been truncated). This is done because
continually
scanning the directory for any new files would be very expensive and generally is not necessary
for a
rolling file pattern. So if you wrote to test2, then test3, it would not notice them until
test1 rolls over.

Also, when the file is rolled over, it will look for other files that have rolled over, but
it will ignore any file
whose Last Modified date is before that of the just-rolled-over file. So if you write to file
test2, then test3, and
then you appended to test1, it will not pick up test2 and test3, as their timestamps come
before the file that
you were testing. While this may seem erroneous given the test that you are providing here,
it does work
well for true "rolling file" scenarios, which is what this Processor is aiming to address.

One thing that I am noticing, as I review this myself, is that if a file rolls over multiple
times while the Processor
is running, it does not pick up all of the changes. I will be addressing this shortly. I created
a ticket [1] for this.
This probably is okay for most use cases, but it could potentially miss some updates to the
file the file is written
at a high rate and the Processor is not scheduled to run very often.

Does all of this make sense? Anything that I'm missing?

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-1171 <https://issues.apache.org/jira/browse/NIFI-1171>


> On Nov 15, 2015, at 12:39 AM, Andre <andre-lists@fucs.org> wrote:
> 
> Hi there,
> 
> I am trying to push the boundaries of the TailFile processor and
> noticed an interesting behavior:
> 
> First I configure the File to Tail to "/log_path/test1"
> 
> I then configure "Rolling Filename Pattern to *"
> 
> I then start the processor and generate data:
> 
> $ echo AAAA > test1
> $ echo AAAA > test1
> $ echo BBBB > test1
> $ echo CCCC > test1
> 
> Until here everything goes by the book. All data is tailed. No losses.
> 
> 
> I then test my workaround to NIFI-1170
> 
> $ echo DDDD > test2
> $ echo EEEE > test3
> $ echo FFFF > test4
> 
> Data is not ingested. So far so good, we are dealing with a hack after all. :-)
> 
> However to my surprise when I tested the following, NiFi failed to
> identify the two lines being fed into files matching the Rolling
> Filename Pattern Expression:
> 
> $ echo GGGG > test11
> $ echo HHHH > test12
> 
> I than stopped the processor and restarted. NiFI then proceeds to
> ingest the data present in all files without duplication.
> 
> Is that the expected behaviour?
> 
> Cheers


Mime
View raw message