nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Payne <marka...@hotmail.com>
Subject Re: Recovery failure
Date Thu, 07 Sep 2017 12:56:43 GMT
Hey Joe,

Awesome! Glad that you were able to address the issue. I think a contrib for that
would be great, if you don't mind. Would be happy to review & merge it.

Thanks
-Mark


> On Sep 6, 2017, at 3:00 PM, Joe Gresock <jgresock@gmail.com> wrote:
> 
> Mark,
> 
> I took the second approach, since the nifi-toolkit-flowfile-repo project
> doesn't appear to exist at version 1.1.0.  I added a line to attempt to get
> the next recoverable transaction ID as you suggested, and it started up
> successfully!  Thanks for your help.
> 
> Is this something that should be contributed, or is it moot with the latest
> version?
> 
> Joe
> 
> On Wed, Sep 6, 2017 at 5:18 PM, Joe Gresock <jgresock@gmail.com> wrote:
> 
>> Thanks Mark, that's the kind of thing I was looking for, this gives me a
>> good starting point.
>> 
>> Joe
>> 
>> On Wed, Sep 6, 2017 at 5:09 PM, Mark Payne <markap14@hotmail.com> wrote:
>> 
>>> Joe,
>>> 
>>> If you wanted to go the route of truncating it, I would recommend
>>> starting with the
>>> nifi-toolkit-flowfile-repo module and update that. It has the
>>> dependencies all already
>>> in place to read the repository and update it. You would want to just
>>> read each
>>> transaction from a partition and write it to a new file until you hit the
>>> EOFException
>>> and then just discard that transaction.
>>> 
>>> The other option - not assuming that EOFException implies out of data
>>> would mean updating
>>> MinimalLockingWirteAheadLog (in the nifi-commons/nifi-write-ahead-log
>>> module) and then
>>> around lines 472-479 updating the logic so that if an Exception is caught
>>> there, we call
>>> nextPartition.getNextRecoverableTransactionId() again
>>> if the partition does actually have more data (may require
>>> adding some sort of isRecoveryDataAvailable() method or something
>>> like that on the Partition class).
>>> 
>>> Does this help?
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> On Sep 6, 2017, at 1:01 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
>>> esock@gmail.com>> wrote:
>>> 
>>> Sorry, 144 was a typo.. there are 14 files.
>>> 
>>> Yes, it appears to have run out of disk space, so that's probably the root
>>> cause.  Can you give my any ideas on how to carry out your two ideas?  How
>>> would I look for the end of a record, so as to truncate it?
>>> 
>>> On Wed, Sep 6, 2017 at 4:55 PM, Mark Payne <markap14@hotmail.com<mailto:m
>>> arkap14@hotmail.com>> wrote:
>>> 
>>> Hmmm ok interesting... once it hits an EOFException it is assuming that
>>> there is no more data in the partition.
>>> Clearly, there is because it then fails when calling endRecovery(). Did
>>> you perhaps run out of disk space on your FlowFile
>>> Repo while it was running or hit an OutOfMemoryError? Perhaps that would
>>> cause an EOFException and then continue writing.
>>> 
>>> The fact that there are 144 files in that directory is also very odd...
>>> there is generally only 1-2 files in that directory. Do all of your
>>> partitions have that many files? Any errors before the restart about not
>>> being able to checkpoint the FlowFile Repo?
>>> 
>>> At this point, I'm not entirely sure what can be done, other than to
>>> perhaps try to manually truncate that last record in the Partition
>>> that is causing the EOFException. Or perhaps the
>>> MinimalLockingWriteAheadLog could be updated to not assume that
>>> EOFException
>>> implies that the partition no longer has data in it. Unfortunately,
>>> though, I'm not seeing any easy work around.
>>> 
>>> On Sep 6, 2017, at 12:37 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
>>> esock@gmail.com>> wrote:
>>> 
>>> Yes, I do see:
>>> ERROR [main] org.wali.MinimalLockingWriteAheadLog
>>> org.wali.MinimalLockingWriteAheadLog@1e620fe7 unexpectedly reached
>>> End-of-File when reading from Partition-214 for Transaction ID
>>> 1918212626;
>>> assuming crash and ignoring this transaction.
>>> 
>>> In that directory, I see 144 files, totalling ~120MB.  The first two
>>> files
>>> are multi-megabyte files, and the other 12 are all either 7K or 4K.
>>> 
>>> On Wed, Sep 6, 2017 at 4:30 PM, Mark Payne <markap14@hotmail.com<mailto:m
>>> arkap14@hotmail.com>> wrote:
>>> 
>>> Joe,
>>> 
>>> Any other errors in the logs? Specifically, looking for errors that
>>> contain the text:
>>> unexpectedly reached End-of-File when reading from
>>> 
>>> or:
>>> unexpectedly found End-of-File when reading from
>>> 
>>> This is not something that I've ever run into personally, but looking
>>> through the code, trying
>>> to understand what may cause this.
>>> 
>>> Also, if you look at the files in /data/nifi/flowfile_
>>> repository/partition-8,
>>> how many files are there in there, and how large are they?
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> 
>>> On Sep 6, 2017, at 12:22 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
>>> esock@gmail.com><mailto:jgr
>>> esock@gmail.com<mailto:esock@gmail.com>>> wrote:
>>> 
>>> 1.1.0, it's not on a system I can copy/paste from, but here's part of
>>> the
>>> stack trace:
>>> 
>>> at
>>> org.wali.MinimalLockingWriteAheadLog$Partition.endRecovery(
>>> MinimalLockingWriteAheadLog.java:1047)
>>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
>>> at
>>> org.wali.MinimalLockingWriteAheadLog.recoverFromEdits(
>>> MinimalLockingWriteAheadLog.java:487)
>>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
>>> at
>>> org.wali.MinimalLockingWriteAheadLog.recoverRecords(
>>> MinimalLockingWriteAheadLog.java:301)
>>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
>>> 
>>> On Wed, Sep 6, 2017 at 4:13 PM, Mark Payne <markap14@hotmail.com<mailto:m
>>> arkap14@hotmail.com>
>>> <mailto:m
>>> arkap14@hotmail.com<mailto:arkap14@hotmail.com>>> wrote:
>>> 
>>> Joe,
>>> 
>>> What version of NiFI are you running? Do you have a stack trace?
>>> 
>>> Thanks
>>> -Mark
>>> 
>>> 
>>> On Sep 6, 2017, at 11:59 AM, Joe Gresock <jgresock@gmail.com<mailto:jgr
>>> esock@gmail.com><mailto:jgr
>>> esock@gmail.com<mailto:esock@gmail.com>>> wrote:
>>> 
>>> I'm wondering if there is a way to recover from this scenario:
>>> 
>>> ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load
>>> flow
>>> from cluster due to: org.apache.nifi.cluster.ConnectionException:
>>> Failed to
>>> connect node to cluster due to: java.lang.IllegalStateException:
>>> Signaled
>>> end to recovery, but there are more recovery files for Partition in
>>> directory /data/nifi/flowfile_repository/partition-8
>>> 
>>> I have nearly a TB of files in my content_repository, so I'd really like
>>> to
>>> be able to salvage this node, but I'm not sure how to proceed, as the
>>> node
>>> won't start up.
>>> 
>>> --
>>> I know what it is to be in need, and I know what it is to have plenty.
>>> I
>>> have learned the secret of being content in any and every situation,
>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>> do
>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>> 
>>> 
>>> 
>>> 
>>> --
>>> I know what it is to be in need, and I know what it is to have plenty.
>>> I
>>> have learned the secret of being content in any and every situation,
>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>> do
>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>> 
>>> 
>>> 
>>> 
>>> --
>>> I know what it is to be in need, and I know what it is to have plenty.  I
>>> have learned the secret of being content in any and every situation,
>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>> do
>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>> 
>>> 
>>> 
>>> 
>>> --
>>> I know what it is to be in need, and I know what it is to have plenty.  I
>>> have learned the secret of being content in any and every situation,
>>> whether well fed or hungry, whether living in plenty or in want.  I can do
>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>> 
>>> 
>> 
>> 
>> --
>> I know what it is to be in need, and I know what it is to have plenty.  I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can
>> do all this through him who gives me strength.    *-Philippians 4:12-13*
>> 
> 
> 
> 
> -- 
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*


Mime
View raw message