Hi Jim,

Apologies for terse response earlier, was typing from phone.  

I am assuming you are on a Linux system.

First and foremost, do checkout the Sys Admin guide [1]. In particular, scope out the best practices [2] for configuration which will have you increase your open file handles.

I do suspect that your hunches are correct, and while this will aid and maybe avoid the issue, getting those resources properly closed out will be the right thing to track down.

Regardless of state, production or dev, there are certainly ways to manage this a bit more and work files through in an iterative manner.

Please report back if these avenues don't solve your issues and we can dive a little deeper if needed.

[1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
[2] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices

On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <jsmcmahon3@gmail.com> wrote:
Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my content, flowfile, and provenance repositories on separate independent disk devices. In my nifi.properties file, nifi.flowfile.repository.partitions equals 256, and always.sync is false. My nifi.queue.swap.threshold is 20000. Since I am currently in development and so this is not a production process, I have set nifi.flowcontroller.autoResumeState to false. In conf/bootstrap.conf, my JVM memory settings are -Xms1024m and -Xmx4096m.

In fact I have not yet applied the best practices from the Sys Admin Guide. I will speak with them about doing this today. I am a little hesitant to just jump into making the seven system changes you detail. NiFi does run on this box, but so do other processed that may be impacted. what's good for NiFi may not be good for these other processes, and so I want to ask first.

My scripts employ a Python stream callback to grab values from select attributes, populate those into a Python dictionary object, generate a json object from that dictionary object, and replace the flowfile contents with that dictionary object. These scripts are called by ExecuteScript processors. Similar scripts are used at various points throughout my workflow, near the end of each branch. Those had been working without any problems until I tried to introduce Python logging yesterday. I suspect I am not releasing file handler resources and logger objects as flowfiles flow through these ExecuteScript processors - maybe? I really am only making educated guesses at this stage. My first objective today is to get NiFi to come back up.

Please tell me: while I am in a dev state right now, had I been in a production state what would have been the repercussions of deleting in its entirety the flowfile_repository, which includes all its journal files?

Thanks very much in advance for your help.

Jim

On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
Hi Jim,

In getting to the root cause, could you please provide information on your environment?  Did you apply the best practices listed in the System Administrator's guide?  Could you provide some details on what your scripts are doing?

If the data is not of importance, removing the Flowfile Repo should get you going. You can additionally remove the content repo, but this should be cleaned up by the framework as no flowfiles will point to said content. 


Aldrin Piri
Sent from my mobile device. 

On Mar 28, 2017, at 06:12, James McMahon <jsmcmahon3@gmail.com> wrote:

I noticed, too, that I have many partitions, partition-0 to partition-255 to be exact. These all have journal files in them. So I suspect that the journal file I cited is not specifically the problem in and of itself, but instead is the point where the allowable open files threshold is reached. I'm wondering if I have to recover by deleting all these partitions? -Jim

On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <jsmcmahon3@gmail.com> wrote:
While trying to use Python logging from two scripts I call via two independent ExecuteScript processors, I seem to have inadvertently created a condition where I have too many files open. This is causing a serious challenge for me, because when I attempt to start nifi (v0.7.1) it fails.

The log indicates that the flow controller cannot be started, and it cites the cause as this:
org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller
.
. (many stack trace entries)
.
Caused by: java.nio.file.FileSystemException: /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too many files open

In a situation like this, what is the best practice for recovery? Is it permissible to simply delete this journal file? What are the negative repercussions of doing that?

I did already try deleting my provenance_repository, but that did not allow nifi to restart. (NiFi did re-establish my provenance_repository at restart).

Thanks very much in advance for your help. -Jim