nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Gresock <jgres...@gmail.com>
Subject Re: Recovery failure
Date Wed, 06 Sep 2017 19:00:24 GMT
Mark,

I took the second approach, since the nifi-toolkit-flowfile-repo project
doesn't appear to exist at version 1.1.0.  I added a line to attempt to get
the next recoverable transaction ID as you suggested, and it started up
successfully!  Thanks for your help.

Is this something that should be contributed, or is it moot with the latest
version?

Joe

On Wed, Sep 6, 2017 at 5:18 PM, Joe Gresock <jgresock@gmail.com> wrote:

> Thanks Mark, that's the kind of thing I was looking for, this gives me a
> good starting point.
>
> Joe
>
> On Wed, Sep 6, 2017 at 5:09 PM, Mark Payne <markap14@hotmail.com> wrote:
>
>> Joe,
>>
>> If you wanted to go the route of truncating it, I would recommend
>> starting with the
>> nifi-toolkit-flowfile-repo module and update that. It has the
>> dependencies all already
>> in place to read the repository and update it. You would want to just
>> read each
>> transaction from a partition and write it to a new file until you hit the
>> EOFException
>> and then just discard that transaction.
>>
>> The other option - not assuming that EOFException implies out of data
>> would mean updating
>> MinimalLockingWirteAheadLog (in the nifi-commons/nifi-write-ahead-log
>> module) and then
>> around lines 472-479 updating the logic so that if an Exception is caught
>> there, we call
>> nextPartition.getNextRecoverableTransactionId() again
>> if the partition does actually have more data (may require
>> adding some sort of isRecoveryDataAvailable() method or something
>> like that on the Partition class).
>>
>> Does this help?
>>
>> Thanks
>> -Mark
>>
>>
>> On Sep 6, 2017, at 1:01 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
>> esock@gmail.com>> wrote:
>>
>> Sorry, 144 was a typo.. there are 14 files.
>>
>> Yes, it appears to have run out of disk space, so that's probably the root
>> cause.  Can you give my any ideas on how to carry out your two ideas?  How
>> would I look for the end of a record, so as to truncate it?
>>
>> On Wed, Sep 6, 2017 at 4:55 PM, Mark Payne <markap14@hotmail.com<mailto:m
>> arkap14@hotmail.com>> wrote:
>>
>> Hmmm ok interesting... once it hits an EOFException it is assuming that
>> there is no more data in the partition.
>> Clearly, there is because it then fails when calling endRecovery(). Did
>> you perhaps run out of disk space on your FlowFile
>> Repo while it was running or hit an OutOfMemoryError? Perhaps that would
>> cause an EOFException and then continue writing.
>>
>> The fact that there are 144 files in that directory is also very odd...
>> there is generally only 1-2 files in that directory. Do all of your
>> partitions have that many files? Any errors before the restart about not
>> being able to checkpoint the FlowFile Repo?
>>
>> At this point, I'm not entirely sure what can be done, other than to
>> perhaps try to manually truncate that last record in the Partition
>> that is causing the EOFException. Or perhaps the
>> MinimalLockingWriteAheadLog could be updated to not assume that
>> EOFException
>> implies that the partition no longer has data in it. Unfortunately,
>> though, I'm not seeing any easy work around.
>>
>> On Sep 6, 2017, at 12:37 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
>> esock@gmail.com>> wrote:
>>
>> Yes, I do see:
>> ERROR [main] org.wali.MinimalLockingWriteAheadLog
>> org.wali.MinimalLockingWriteAheadLog@1e620fe7 unexpectedly reached
>> End-of-File when reading from Partition-214 for Transaction ID
>> 1918212626;
>> assuming crash and ignoring this transaction.
>>
>> In that directory, I see 144 files, totalling ~120MB.  The first two
>> files
>> are multi-megabyte files, and the other 12 are all either 7K or 4K.
>>
>> On Wed, Sep 6, 2017 at 4:30 PM, Mark Payne <markap14@hotmail.com<mailto:m
>> arkap14@hotmail.com>> wrote:
>>
>> Joe,
>>
>> Any other errors in the logs? Specifically, looking for errors that
>> contain the text:
>> unexpectedly reached End-of-File when reading from
>>
>> or:
>> unexpectedly found End-of-File when reading from
>>
>> This is not something that I've ever run into personally, but looking
>> through the code, trying
>> to understand what may cause this.
>>
>> Also, if you look at the files in /data/nifi/flowfile_
>> repository/partition-8,
>> how many files are there in there, and how large are they?
>>
>> Thanks
>> -Mark
>>
>>
>>
>> On Sep 6, 2017, at 12:22 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
>> esock@gmail.com><mailto:jgr
>> esock@gmail.com<mailto:esock@gmail.com>>> wrote:
>>
>> 1.1.0, it's not on a system I can copy/paste from, but here's part of
>> the
>> stack trace:
>>
>> at
>> org.wali.MinimalLockingWriteAheadLog$Partition.endRecovery(
>> MinimalLockingWriteAheadLog.java:1047)
>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
>> at
>> org.wali.MinimalLockingWriteAheadLog.recoverFromEdits(
>> MinimalLockingWriteAheadLog.java:487)
>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
>> at
>> org.wali.MinimalLockingWriteAheadLog.recoverRecords(
>> MinimalLockingWriteAheadLog.java:301)
>> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
>>
>> On Wed, Sep 6, 2017 at 4:13 PM, Mark Payne <markap14@hotmail.com<mailto:m
>> arkap14@hotmail.com>
>> <mailto:m
>> arkap14@hotmail.com<mailto:arkap14@hotmail.com>>> wrote:
>>
>> Joe,
>>
>> What version of NiFI are you running? Do you have a stack trace?
>>
>> Thanks
>> -Mark
>>
>>
>> On Sep 6, 2017, at 11:59 AM, Joe Gresock <jgresock@gmail.com<mailto:jgr
>> esock@gmail.com><mailto:jgr
>> esock@gmail.com<mailto:esock@gmail.com>>> wrote:
>>
>> I'm wondering if there is a way to recover from this scenario:
>>
>> ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load
>> flow
>> from cluster due to: org.apache.nifi.cluster.ConnectionException:
>> Failed to
>> connect node to cluster due to: java.lang.IllegalStateException:
>> Signaled
>> end to recovery, but there are more recovery files for Partition in
>> directory /data/nifi/flowfile_repository/partition-8
>>
>> I have nearly a TB of files in my content_repository, so I'd really like
>> to
>> be able to salvage this node, but I'm not sure how to proceed, as the
>> node
>> won't start up.
>>
>> --
>> I know what it is to be in need, and I know what it is to have plenty.
>> I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can
>> do
>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>>
>>
>>
>> --
>> I know what it is to be in need, and I know what it is to have plenty.
>> I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can
>> do
>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>>
>>
>>
>> --
>> I know what it is to be in need, and I know what it is to have plenty.  I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can
>> do
>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>>
>>
>>
>> --
>> I know what it is to be in need, and I know what it is to have plenty.  I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can do
>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do all this through him who gives me strength.    *-Philippians 4:12-13*
>



-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message