nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aldrin Piri <aldrinp...@gmail.com>
Subject Re: Cannot Restart Nifi
Date Tue, 28 Mar 2017 12:46:29 GMT
Jim,

In terms of trying to ease NiFi at start up, you could also try
setting nifi.flowcontroller.autoResumeState to false in your
nifi.properties.  Depending on how your flow and scripts are constructed,
this may allow you to piecewise alleviate any large queues/processing of
files that could be causing the issue at hand.  You could additionally
bypass the possible troublesome script processors to cache this data to
disk elsewhere as a stop gap measure.

On Tue, Mar 28, 2017 at 8:17 AM, Joe Witt <joe.witt@gmail.com> wrote:

> Jim,
>
> It is very possible/likely that correcting the number of file handles
> linux allows a process to have will get nifi back on track.
>
> Thanks
> Joe
>
> On Tue, Mar 28, 2017 at 8:13 AM, James McMahon <jsmcmahon3@gmail.com>
> wrote:
> > No apology necessary Aldrin. I'm much obliged to you and to Joe for all
> your
> > help. My game plan is as follows:
> > 1- speak with the admin of my Linux box about executing all the sys admin
> > "best practice" changes
> > 2- barring doing them all, at minimum increase max permitted open files
> from
> > 1024 to 50000
> > 3- reboot my Linux box, and then attempt to start NiFi
> > 4- if 3 fails, rm -rf ./flowfile_repository on this, my dev box. Start
> nifi,
> > get in there, and eliminate that Python logging. Find another way to log
> > results to a system file, perhaps using a NiFi processor.
> >
> > - Jim
> >
> > On Tue, Mar 28, 2017 at 7:54 AM, Aldrin Piri <aldrinpiri@gmail.com>
> wrote:
> >>
> >> Hi Jim,
> >>
> >> Apologies for terse response earlier, was typing from phone.
> >>
> >> I am assuming you are on a Linux system.
> >>
> >> First and foremost, do checkout the Sys Admin guide [1]. In particular,
> >> scope out the best practices [2] for configuration which will have you
> >> increase your open file handles.
> >>
> >> I do suspect that your hunches are correct, and while this will aid and
> >> maybe avoid the issue, getting those resources properly closed out will
> be
> >> the right thing to track down.
> >>
> >> Regardless of state, production or dev, there are certainly ways to
> manage
> >> this a bit more and work files through in an iterative manner.
> >>
> >> Please report back if these avenues don't solve your issues and we can
> >> dive a little deeper if needed.
> >>
> >> [1] https://nifi.apache.org/docs/nifi-docs/html/administration-
> guide.html
> >> [2]
> >> https://nifi.apache.org/docs/nifi-docs/html/administration-
> guide.html#configuration-best-practices
> >>
> >> On Tue, Mar 28, 2017 at 7:46 AM, James McMahon <jsmcmahon3@gmail.com>
> >> wrote:
> >>>
> >>> Hi Aldrin. Yes sir, of course: my environment is NiFi v0.7. I have my
> >>> content, flowfile, and provenance repositories on separate independent
> disk
> >>> devices. In my nifi.properties file, nifi.flowfile.repository.
> partitions
> >>> equals 256, and always.sync is false. My nifi.queue.swap.threshold is
> 20000.
> >>> Since I am currently in development and so this is not a production
> process,
> >>> I have set nifi.flowcontroller.autoResumeState to false. In
> >>> conf/bootstrap.conf, my JVM memory settings are -Xms1024m and
> -Xmx4096m.
> >>>
> >>> In fact I have not yet applied the best practices from the Sys Admin
> >>> Guide. I will speak with them about doing this today. I am a little
> hesitant
> >>> to just jump into making the seven system changes you detail. NiFi
> does run
> >>> on this box, but so do other processed that may be impacted. what's
> good for
> >>> NiFi may not be good for these other processes, and so I want to ask
> first.
> >>>
> >>> My scripts employ a Python stream callback to grab values from select
> >>> attributes, populate those into a Python dictionary object, generate a
> json
> >>> object from that dictionary object, and replace the flowfile contents
> with
> >>> that dictionary object. These scripts are called by ExecuteScript
> >>> processors. Similar scripts are used at various points throughout my
> >>> workflow, near the end of each branch. Those had been working without
> any
> >>> problems until I tried to introduce Python logging yesterday. I
> suspect I am
> >>> not releasing file handler resources and logger objects as flowfiles
> flow
> >>> through these ExecuteScript processors - maybe? I really am only making
> >>> educated guesses at this stage. My first objective today is to get
> NiFi to
> >>> come back up.
> >>>
> >>> Please tell me: while I am in a dev state right now, had I been in a
> >>> production state what would have been the repercussions of deleting in
> its
> >>> entirety the flowfile_repository, which includes all its journal files?
> >>>
> >>> Thanks very much in advance for your help.
> >>>
> >>> Jim
> >>>
> >>> On Tue, Mar 28, 2017 at 6:57 AM, Aldrin Piri <aldrinpiri@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi Jim,
> >>>>
> >>>> In getting to the root cause, could you please provide information on
> >>>> your environment?  Did you apply the best practices listed in the
> System
> >>>> Administrator's guide?  Could you provide some details on what your
> scripts
> >>>> are doing?
> >>>>
> >>>> If the data is not of importance, removing the Flowfile Repo should
> get
> >>>> you going. You can additionally remove the content repo, but this
> should be
> >>>> cleaned up by the framework as no flowfiles will point to said
> content.
> >>>>
> >>>>
> >>>> Aldrin Piri
> >>>> Sent from my mobile device.
> >>>>
> >>>> On Mar 28, 2017, at 06:12, James McMahon <jsmcmahon3@gmail.com>
> wrote:
> >>>>
> >>>> I noticed, too, that I have many partitions, partition-0 to
> >>>> partition-255 to be exact. These all have journal files in them. So
I
> >>>> suspect that the journal file I cited is not specifically the problem
> in and
> >>>> of itself, but instead is the point where the allowable open files
> threshold
> >>>> is reached. I'm wondering if I have to recover by deleting all these
> >>>> partitions? -Jim
> >>>>
> >>>> On Tue, Mar 28, 2017 at 5:58 AM, James McMahon <jsmcmahon3@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>> While trying to use Python logging from two scripts I call via two
> >>>>> independent ExecuteScript processors, I seem to have inadvertently
> created a
> >>>>> condition where I have too many files open. This is causing a serious
> >>>>> challenge for me, because when I attempt to start nifi (v0.7.1)
it
> fails.
> >>>>>
> >>>>> The log indicates that the flow controller cannot be started, and
it
> >>>>> cites the cause as this:
> >>>>> org.apache.nifi.web.NiFiCoreException: Unable to start Flow
> Controller
> >>>>> .
> >>>>> . (many stack trace entries)
> >>>>> .
> >>>>> Caused by: java.nio.file.FileSystemException:
> >>>>> /mnt/flow_repo/flowfile_repository/partition-86/83856.journal: Too
> many
> >>>>> files open
> >>>>>
> >>>>> In a situation like this, what is the best practice for recovery?
Is
> it
> >>>>> permissible to simply delete this journal file? What are the negative
> >>>>> repercussions of doing that?
> >>>>>
> >>>>> I did already try deleting my provenance_repository, but that did
not
> >>>>> allow nifi to restart. (NiFi did re-establish my
> provenance_repository at
> >>>>> restart).
> >>>>>
> >>>>> Thanks very much in advance for your help. -Jim
> >>>>
> >>>>
> >>>
> >>
> >
>

Mime
View raw message