nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: Cannot Restart Nifi
Date Tue, 28 Mar 2017 12:17:59 GMT
Jim,

It is very possible/likely that correcting the number of file handles
linux allows a process to have will get nifi back on track.

Thanks
Joe

On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <jsmcmahon3@gmail.com> wrote:
> No apology necessary Aldrin. I'm much obliged to you and to Joe for all your
> help. My game plan is as follows:
> 1- speak with the admin of my Linux box about executing all the sys admin
> "best practice" changes
> 2- barring doing them all, at minimum increase max permitted open files from
> 1024 to 50000
> 3- reboot my Linux box, and then attempt to start NiFi
> 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start nifi,
> get in there, and eliminate that Python logging. Find another way to log
> results to a system file, perhaps using a NiFi processor.
>
> - Jim
>
> On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <aldrinpiri@gmail.com> wrote:
>>
>> Hi Jim,
>>
>> Apologies for terse response earlier, was typing from phone.
>>
>> I am assuming you are on a Linux system.
>>
>> First and foremost, do checkout the Sys Admin guide [1]. In particular,
>> scope out the best practices [2] for configuration which will have you
>> increase your open file handles.
>>
>> I do suspect that your hunches are correct, and while this will aid and
>> maybe avoid the issue, getting those resources properly closed out will be
>> the right thing to track down.
>>
>> Regardless of state, production or dev, there are certainly ways to manage
>> this a bit more and work files through in an iterative manner.
>>
>> Please report back if these avenues don't solve your issues and we can
>> dive a little deeper if needed.
>>
>> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html
>> [2]
>> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#configuration-best-practices
>>
>> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <jsmcmahon3@gmail.com>
>> wrote:
>>>
>>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my
>>> content, flowfile, and provenance repositories on separate independent disk
>>> devices. In my nifi.properties file, nifi.flowfile.repository.partitions
>>> equals 256, and always.sync is false. My nifi.queue.swap.threshold is 20000.
>>> Since I am currently in development and so this is not a production process,
>>> I have set nifi.flowcontroller.autoResumeState to false. In
>>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and -Xmx4096m.
>>>
>>> In fact I have not yet applied the best practices from the Sys Admin
>>> Guide. I will speak with them about doing this today. I am a little hesitant
>>> to just jump into making the seven system changes you detail. NiFi does run
>>> on this box, but so do other processed that may be impacted. what's good for
>>> NiFi may not be good for these other processes, and so I want to ask first.
>>>
>>> My scripts employ a Python stream callback to grab values from select
>>> attributes, populate those into a Python dictionary object, generate a json
>>> object from that dictionary object, and replace the flowfile contents with
>>> that dictionary object. These scripts are called by ExecuteScript
>>> processors. Similar scripts are used at various points throughout my
>>> workflow, near the end of each branch. Those had been working without any
>>> problems until I tried to introduce Python logging yesterday. I suspect I am
>>> not releasing file handler resources and logger objects as flowfiles flow
>>> through these ExecuteScript processors - maybe? I really am only making
>>> educated guesses at this stage. My first objective today is to get NiFi to
>>> come back up.
>>>
>>> Please tell me: while I am in a dev state right now, had I been in a
>>> production state what would have been the repercussions of deleting in its
>>> entirety the flowfile_repository, which includes all its journal files?
>>>
>>> Thanks very much in advance for your help.
>>>
>>> Jim
>>>
>>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <aldrinpiri@gmail.com>
>>> wrote:
>>>>
>>>> Hi Jim,
>>>>
>>>> In getting to the root cause, could you please provide information on
>>>> your environment?  Did you apply the best practices listed in the System
>>>> Administrator's guide?  Could you provide some details on what your scripts
>>>> are doing?
>>>>
>>>> If the data is not of importance, removing the Flowfile Repo should get
>>>> you going. You can additionally remove the content repo, but this should
be
>>>> cleaned up by the framework as no flowfiles will point to said content.
>>>>
>>>>
>>>> Aldrin Piri
>>>> Sent from my mobile device.
>>>>
>>>> On Mar 28, 2017, at 06:12, James McMahon <jsmcmahon3@gmail.com> wrote:
>>>>
>>>> I noticed, too, that I have many partitions, partition-0 to
>>>> partition-255 to be exact. These all have journal files in them. So I
>>>> suspect that the journal file I cited is not specifically the problem in
and
>>>> of itself, but instead is the point where the allowable open files threshold
>>>> is reached. I'm wondering if I have to recover by deleting all these
>>>> partitions? -Jim
>>>>
>>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <jsmcmahon3@gmail.com>
>>>> wrote:
>>>>>
>>>>> While trying to use Python logging from two scripts I call via two
>>>>> independent ExecuteScript processors, I seem to have inadvertently created
a
>>>>> condition where I have too many files open. This is causing a serious
>>>>> challenge for me, because when I attempt to start nifi (v0.7.1) it fails.
>>>>>
>>>>> The log indicates that the flow controller cannot be started, and it
>>>>> cites the cause as this:
>>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow Controller
>>>>> .
>>>>> . (many stack trace entries)
>>>>> .
>>>>> Caused by: java.nio.file.FileSystemException:
>>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too many
>>>>> files open
>>>>>
>>>>> In a situation like this, what is the best practice for recovery? Is
it
>>>>> permissible to simply delete this journal file? What are the negative
>>>>> repercussions of doing that?
>>>>>
>>>>> I did already try deleting my provenance_repository, but that did not
>>>>> allow nifi to restart. (NiFi did re-establish my provenance_repository
at
>>>>> restart).
>>>>>
>>>>> Thanks very much in advance for your help. -Jim
>>>>
>>>>
>>>
>>
>

Mime
View raw message