nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Gresock <jgres...@gmail.com>
Subject Re: Recovery failure
Date Wed, 06 Sep 2017 17:01:57 GMT
Sorry, 144 was a typo.. there are 14 files.

Yes, it appears to have run out of disk space, so that's probably the root
cause.  Can you give my any ideas on how to carry out your two ideas?  How
would I look for the end of a record, so as to truncate it?

On Wed, Sep 6, 2017 at 4:55 PM, Mark Payne <markap14@hotmail.com> wrote:

> Hmmm ok interesting... once it hits an EOFException it is assuming that
> there is no more data in the partition.
> Clearly, there is because it then fails when calling endRecovery(). Did
> you perhaps run out of disk space on your FlowFile
> Repo while it was running or hit an OutOfMemoryError? Perhaps that would
> cause an EOFException and then continue writing.
>
> The fact that there are 144 files in that directory is also very odd...
> there is generally only 1-2 files in that directory. Do all of your
> partitions have that many files? Any errors before the restart about not
> being able to checkpoint the FlowFile Repo?
>
> At this point, I'm not entirely sure what can be done, other than to
> perhaps try to manually truncate that last record in the Partition
> that is causing the EOFException. Or perhaps the
> MinimalLockingWriteAheadLog could be updated to not assume that EOFException
> implies that the partition no longer has data in it. Unfortunately,
> though, I'm not seeing any easy work around.
>
> > On Sep 6, 2017, at 12:37 PM, Joe Gresock <jgresock@gmail.com> wrote:
> >
> > Yes, I do see:
> > ERROR [main] org.wali.MinimalLockingWriteAheadLog
> > org.wali.MinimalLockingWriteAheadLog@1e620fe7 unexpectedly reached
> > End-of-File when reading from Partition-214 for Transaction ID
> 1918212626;
> > assuming crash and ignoring this transaction.
> >
> > In that directory, I see 144 files, totalling ~120MB.  The first two
> files
> > are multi-megabyte files, and the other 12 are all either 7K or 4K.
> >
> > On Wed, Sep 6, 2017 at 4:30 PM, Mark Payne <markap14@hotmail.com> wrote:
> >
> >> Joe,
> >>
> >> Any other errors in the logs? Specifically, looking for errors that
> >> contain the text:
> >> unexpectedly reached End-of-File when reading from
> >>
> >> or:
> >> unexpectedly found End-of-File when reading from
> >>
> >> This is not something that I've ever run into personally, but looking
> >> through the code, trying
> >> to understand what may cause this.
> >>
> >> Also, if you look at the files in /data/nifi/flowfile_
> >> repository/partition-8,
> >> how many files are there in there, and how large are they?
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>
> >> On Sep 6, 2017, at 12:22 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
> >> esock@gmail.com>> wrote:
> >>
> >> 1.1.0, it's not on a system I can copy/paste from, but here's part of
> the
> >> stack trace:
> >>
> >> at
> >> org.wali.MinimalLockingWriteAheadLog$Partition.endRecovery(
> >> MinimalLockingWriteAheadLog.java:1047)
> >> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
> >> at
> >> org.wali.MinimalLockingWriteAheadLog.recoverFromEdits(
> >> MinimalLockingWriteAheadLog.java:487)
> >> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
> >> at
> >> org.wali.MinimalLockingWriteAheadLog.recoverRecords(
> >> MinimalLockingWriteAheadLog.java:301)
> >> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
> >>
> >> On Wed, Sep 6, 2017 at 4:13 PM, Mark Payne <markap14@hotmail.com
> <mailto:m
> >> arkap14@hotmail.com>> wrote:
> >>
> >> Joe,
> >>
> >> What version of NiFI are you running? Do you have a stack trace?
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >> On Sep 6, 2017, at 11:59 AM, Joe Gresock <jgresock@gmail.com<mailto:jgr
> >> esock@gmail.com>> wrote:
> >>
> >> I'm wondering if there is a way to recover from this scenario:
> >>
> >> ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load
> flow
> >> from cluster due to: org.apache.nifi.cluster.ConnectionException:
> >> Failed to
> >> connect node to cluster due to: java.lang.IllegalStateException:
> >> Signaled
> >> end to recovery, but there are more recovery files for Partition in
> >> directory /data/nifi/flowfile_repository/partition-8
> >>
> >> I have nearly a TB of files in my content_repository, so I'd really like
> >> to
> >> be able to salvage this node, but I'm not sure how to proceed, as the
> >> node
> >> won't start up.
> >>
> >> --
> >> I know what it is to be in need, and I know what it is to have plenty.
> I
> >> have learned the secret of being content in any and every situation,
> >> whether well fed or hungry, whether living in plenty or in want.  I can
> >> do
> >> all this through him who gives me strength.    *-Philippians 4:12-13*
> >>
> >>
> >>
> >>
> >> --
> >> I know what it is to be in need, and I know what it is to have plenty.
> I
> >> have learned the secret of being content in any and every situation,
> >> whether well fed or hungry, whether living in plenty or in want.  I can
> do
> >> all this through him who gives me strength.    *-Philippians 4:12-13*
> >>
> >>
> >
> >
> > --
> > I know what it is to be in need, and I know what it is to have plenty.  I
> > have learned the secret of being content in any and every situation,
> > whether well fed or hungry, whether living in plenty or in want.  I can
> do
> > all this through him who gives me strength.    *-Philippians 4:12-13*
>
>


-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message