great news and thank you very much!
I have created a Jira  for this. There’s currently a PR up for it as well.
On Apr 9, 2020, at 11:14 AM, Dobbernack, Harald (Key-Work) <email@example.com> wrote:
I can confirm after testing that if no provenance event has been generated in a time greater than the set nifi.provenance.repository.max.storage.time then as expected the last recorded provenance events don’t exist anymore but also from then on any new provenance events are also not searchable, the provenance Search remains completely empty regardless of how many flows are active. As described also *.prov file is then missing in provenance repository. After restart of Nifi new prov File will be generated and provenance will work again, but only showing stuff generated since last NiFi Start.
So yes, I’d say your Idea
‘If so, then I think that would understand why it deleted the data. It’s trying to age off old data
but unfortunately it doesn’t perform a check to first determine whether or not the “old file”
that it’s about to delete is also the “active file”.’
fits very nicely to my test.
As a workaround we’re going to set a greater nifi.provenance.repository.max.storage.time until this can be resolved.
Thanks again for looking into this.
thank you for looking into this.
The nifi.provenance.repository.max.storage.time setting might explain why I haven’t been experiencing the effect so often since changing from the default to 120 hours a few months ago 😉
But I believe provenance stopped working last time although there was an ‘active’ flows in wait Processor, expiring every hour, going on to ‘send a message’ before being rerouted to the same wait processor. I would have expected this generates provenance entries? As I am not actually 100% sure if that wait processor was in use when last provenance got lost I will check with a testing system to see if I can reproduce provenance breakage when no active flows are around for a time greater nifi.provenance.repository.max.storage.time and I will get back to you.
Hey Daren, Herald,
Thanks for the note. I have seen this once before but couldn’t figure out what caused it. Restarting addressed the issue.
I think I may understand the problem, now, though, after looking at it again.
In nifi.properties, there are a couple of property named “nifi.provenance.repository.max.storage.time” that defaults to “24 hours"
Is it possible that you went 24 hours (or whatever value is set for that property) without generating any Provenance events?
If so, then I think that would understand why it deleted the data. It’s trying to age off old data but unfortunately it doesn’t perform a check to first determine whether or not the “old file” that it’s about to delete is also the “active file”.
Can you confirm whether or not you would expect to see 24 hours pass without any provenance data?
On Apr 9, 2020, at 4:32 AM, Dobbernack, Harald (Key-Work) <firstname.lastname@example.org> wrote:
What I noticed is that as long as provenance is working there will be *.prov files in the directory. When Provenance isn’t working these files are not to be seen. Maybe some Cleaning Process deletes those files prematurely or the process building them doesn’t work any more?
This is something I experience too from time to time. My quick and dirty workaround is stop nifi, delete everything in the provenance directory, restart…. Then Provenance is usable again (of course only with data since the delete) . I’m hoping very much there is a better way, someone can show us better settings or a potential bug can be discovered…
When I go to "View data provenance" in Nifi, I never see any logs for my flow. Am I missing some configuration setting somewhere?
Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | https://www.key-work.de | Datenschutz
Fon: +49-721-78203-264 | E-Mail: email@example.com | Fax: +49-721-78203-10
Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
Geschäftsführer: Andreas Stappert, Tobin Wotring