nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Moser <>
Subject Re: heisenbug causing "lost" content claims
Date Fri, 04 Mar 2016 21:59:45 GMT
Thanks for the reply, Mark.

NIFI-1577 isn't the cause because I don't think we were using any processor
that does ProcessSession.append().
NIFI-1527 mentions a problem that occurs when NiFi starts, and our NiFi had
been running for several days.

Setting aside the "Too many open files" cause for the moment.  Here's what
we saw when the NiFi JVM encountered Too many open files:

ERROR [Site-to-Site Worker Thread] o.a.nifi.remote.SocketRemoteSiteListener
Unable to communicate with remote instance due to
o.a.nifi.processor.exception.FlowFileAccessException: Failed to import data
from for
StandardFlowFileRecord[uuid=foo,claim=,offset=0,name=filename,size=0] due
to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to
create ContentClaim due to
content_repository/1/1-1 (Too many open files); closing connection

This NiFi instance was using a remote process group Input Port to accept
new files.  It appears after the exception that a flowfile exists in the
flowfile_repository but the ContentClaim doesn't get a chance to exist in
the content_repository.

-- Mike

On Fri, Mar 4, 2016 at 3:03 PM, Mark Payne <> wrote:

> Tony,
> The two tickets that come to mind are:
> <
>> (Too many open files)
> <
>> (ContentNotFound)
> Do these sound like they may be what is causing your issues?
> Thanks
> -Mark
> > On Mar 4, 2016, at 2:57 PM, Tony Kurc <> wrote:
> >
> > All,
> > I wanted to describe an issue on a nifi instance we've been using 0.4.1
> on,
> > and why diagnosing it and reproducing it may be difficult. This is on a
> > linux server, where we have a reasonably high load, and the error happens
> > infrequently, but when it does, it really gums up operations.
> >
> > At some point we get an IOException for too many open files. (with an
> > awfully high limit of open files in ulimit, so not sure why that is
> > happening).
> >
> > Some time later, when trying to read a flowfile in a processor, we get a
> > ContentNotFoundException because presumably a flowfile is pointing to
> > content that was never written. When this happens, we basically have to
> > remove the flowfile manually (and if no one is watching at the moment or
> > the processor that reads isn't configured to handle this, or if you're
> not
> > using 0.5.x where you can selectively remove flowfiles from a queue this
> > can cause operational challenges).
> >
> > Because this happens so infrequently, I'm not sure if others have seen
> > this. I'm not sure if something in the framework may need to adjustment
> if
> > a content claim goes wrong, but I really didn't expect that a flowfile
> with
> > no actual content should be able to be created, which seems to be what
> > happened (rather than the content being deleted or corrupted).
> >
> > Anyone else experience this, or know maybe if something in 0.5.X may have
> > addressed this (looking through the release notes, nothing jumped out).
> >
> > Tony

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message