nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Witt <joe.w...@gmail.com>
Subject Re: heisenbug causing "lost" content claims
Date Fri, 04 Mar 2016 22:04:15 GMT
Mike,

Does this flow have MergeContent processor on it?

Thanks
Joe

On Fri, Mar 4, 2016 at 4:59 PM, Michael Moser <moser.mw@gmail.com> wrote:
> Thanks for the reply, Mark.
>
> NIFI-1577 isn't the cause because I don't think we were using any processor
> that does ProcessSession.append().
> NIFI-1527 mentions a problem that occurs when NiFi starts, and our NiFi had
> been running for several days.
>
> Setting aside the "Too many open files" cause for the moment.  Here's what
> we saw when the NiFi JVM encountered Too many open files:
>
> ERROR [Site-to-Site Worker Thread] o.a.nifi.remote.SocketRemoteSiteListener
> Unable to communicate with remote instance due to
> o.a.nifi.processor.exception.ProcessException:
> o.a.nifi.processor.exception.FlowFileAccessException: Failed to import data
> from org.apache.nifi.stream.io.MinimumLengthInputStream@1234 for
> StandardFlowFileRecord[uuid=foo,claim=,offset=0,name=filename,size=0] due
> to org.apache.nifi.processor.exception.FlowFileAccessException: Unable to
> create ContentClaim due to java.io.FileNotFoundException:
> content_repository/1/1-1 (Too many open files); closing connection
>
> This NiFi instance was using a remote process group Input Port to accept
> new files.  It appears after the exception that a flowfile exists in the
> flowfile_repository but the ContentClaim doesn't get a chance to exist in
> the content_repository.
>
> -- Mike
>
>
>
> On Fri, Mar 4, 2016 at 3:03 PM, Mark Payne <markap14@hotmail.com> wrote:
>
>> Tony,
>>
>> The two tickets that come to mind are:
>> https://issues.apache.org/jira/browse/NIFI-1577 <
>> https://issues.apache.org/jira/browse/NIFI-1577> (Too many open files)
>> https://issues.apache.org/jira/browse/NIFI-1527 <
>> https://issues.apache.org/jira/browse/NIFI-1527> (ContentNotFound)
>>
>> Do these sound like they may be what is causing your issues?
>>
>> Thanks
>> -Mark
>>
>>
>> > On Mar 4, 2016, at 2:57 PM, Tony Kurc <trkurc@gmail.com> wrote:
>> >
>> > All,
>> > I wanted to describe an issue on a nifi instance we've been using 0.4.1
>> on,
>> > and why diagnosing it and reproducing it may be difficult. This is on a
>> > linux server, where we have a reasonably high load, and the error happens
>> > infrequently, but when it does, it really gums up operations.
>> >
>> > At some point we get an IOException for too many open files. (with an
>> > awfully high limit of open files in ulimit, so not sure why that is
>> > happening).
>> >
>> > Some time later, when trying to read a flowfile in a processor, we get a
>> > ContentNotFoundException because presumably a flowfile is pointing to
>> > content that was never written. When this happens, we basically have to
>> > remove the flowfile manually (and if no one is watching at the moment or
>> > the processor that reads isn't configured to handle this, or if you're
>> not
>> > using 0.5.x where you can selectively remove flowfiles from a queue this
>> > can cause operational challenges).
>> >
>> > Because this happens so infrequently, I'm not sure if others have seen
>> > this. I'm not sure if something in the framework may need to adjustment
>> if
>> > a content claim goes wrong, but I really didn't expect that a flowfile
>> with
>> > no actual content should be able to be created, which seems to be what
>> > happened (rather than the content being deleted or corrupted).
>> >
>> > Anyone else experience this, or know maybe if something in 0.5.X may have
>> > addressed this (looking through the release notes, nothing jumped out).
>> >
>> > Tony
>>
>>

Mime
View raw message