nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Gresock <jgres...@gmail.com>
Subject Re: Recovery failure
Date Wed, 06 Sep 2017 17:18:04 GMT
Thanks Mark, that's the kind of thing I was looking for, this gives me a
good starting point.

Joe

On Wed, Sep 6, 2017 at 5:09 PM, Mark Payne <markap14@hotmail.com> wrote:

> Joe,
>
> If you wanted to go the route of truncating it, I would recommend starting
> with the
> nifi-toolkit-flowfile-repo module and update that. It has the dependencies
> all already
> in place to read the repository and update it. You would want to just read
> each
> transaction from a partition and write it to a new file until you hit the
> EOFException
> and then just discard that transaction.
>
> The other option - not assuming that EOFException implies out of data
> would mean updating
> MinimalLockingWirteAheadLog (in the nifi-commons/nifi-write-ahead-log
> module) and then
> around lines 472-479 updating the logic so that if an Exception is caught
> there, we call
> nextPartition.getNextRecoverableTransactionId() again
> if the partition does actually have more data (may require
> adding some sort of isRecoveryDataAvailable() method or something
> like that on the Partition class).
>
> Does this help?
>
> Thanks
> -Mark
>
>
> On Sep 6, 2017, at 1:01 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
> esock@gmail.com>> wrote:
>
> Sorry, 144 was a typo.. there are 14 files.
>
> Yes, it appears to have run out of disk space, so that's probably the root
> cause.  Can you give my any ideas on how to carry out your two ideas?  How
> would I look for the end of a record, so as to truncate it?
>
> On Wed, Sep 6, 2017 at 4:55 PM, Mark Payne <markap14@hotmail.com<mailto:m
> arkap14@hotmail.com>> wrote:
>
> Hmmm ok interesting... once it hits an EOFException it is assuming that
> there is no more data in the partition.
> Clearly, there is because it then fails when calling endRecovery(). Did
> you perhaps run out of disk space on your FlowFile
> Repo while it was running or hit an OutOfMemoryError? Perhaps that would
> cause an EOFException and then continue writing.
>
> The fact that there are 144 files in that directory is also very odd...
> there is generally only 1-2 files in that directory. Do all of your
> partitions have that many files? Any errors before the restart about not
> being able to checkpoint the FlowFile Repo?
>
> At this point, I'm not entirely sure what can be done, other than to
> perhaps try to manually truncate that last record in the Partition
> that is causing the EOFException. Or perhaps the
> MinimalLockingWriteAheadLog could be updated to not assume that
> EOFException
> implies that the partition no longer has data in it. Unfortunately,
> though, I'm not seeing any easy work around.
>
> On Sep 6, 2017, at 12:37 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
> esock@gmail.com>> wrote:
>
> Yes, I do see:
> ERROR [main] org.wali.MinimalLockingWriteAheadLog
> org.wali.MinimalLockingWriteAheadLog@1e620fe7 unexpectedly reached
> End-of-File when reading from Partition-214 for Transaction ID
> 1918212626;
> assuming crash and ignoring this transaction.
>
> In that directory, I see 144 files, totalling ~120MB.  The first two
> files
> are multi-megabyte files, and the other 12 are all either 7K or 4K.
>
> On Wed, Sep 6, 2017 at 4:30 PM, Mark Payne <markap14@hotmail.com<mailto:m
> arkap14@hotmail.com>> wrote:
>
> Joe,
>
> Any other errors in the logs? Specifically, looking for errors that
> contain the text:
> unexpectedly reached End-of-File when reading from
>
> or:
> unexpectedly found End-of-File when reading from
>
> This is not something that I've ever run into personally, but looking
> through the code, trying
> to understand what may cause this.
>
> Also, if you look at the files in /data/nifi/flowfile_
> repository/partition-8,
> how many files are there in there, and how large are they?
>
> Thanks
> -Mark
>
>
>
> On Sep 6, 2017, at 12:22 PM, Joe Gresock <jgresock@gmail.com<mailto:jgr
> esock@gmail.com><mailto:jgr
> esock@gmail.com<mailto:esock@gmail.com>>> wrote:
>
> 1.1.0, it's not on a system I can copy/paste from, but here's part of
> the
> stack trace:
>
> at
> org.wali.MinimalLockingWriteAheadLog$Partition.endRecovery(
> MinimalLockingWriteAheadLog.java:1047)
> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
> at
> org.wali.MinimalLockingWriteAheadLog.recoverFromEdits(
> MinimalLockingWriteAheadLog.java:487)
> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
> at
> org.wali.MinimalLockingWriteAheadLog.recoverRecords(
> MinimalLockingWriteAheadLog.java:301)
> ~[nifi-write-ahead-log-1.1.0.jar:1.1.0]
>
> On Wed, Sep 6, 2017 at 4:13 PM, Mark Payne <markap14@hotmail.com<mailto:m
> arkap14@hotmail.com>
> <mailto:m
> arkap14@hotmail.com<mailto:arkap14@hotmail.com>>> wrote:
>
> Joe,
>
> What version of NiFI are you running? Do you have a stack trace?
>
> Thanks
> -Mark
>
>
> On Sep 6, 2017, at 11:59 AM, Joe Gresock <jgresock@gmail.com<mailto:jgr
> esock@gmail.com><mailto:jgr
> esock@gmail.com<mailto:esock@gmail.com>>> wrote:
>
> I'm wondering if there is a way to recover from this scenario:
>
> ERROR [main] o.a.nifi.controller.StandardFlowService Failed to load
> flow
> from cluster due to: org.apache.nifi.cluster.ConnectionException:
> Failed to
> connect node to cluster due to: java.lang.IllegalStateException:
> Signaled
> end to recovery, but there are more recovery files for Partition in
> directory /data/nifi/flowfile_repository/partition-8
>
> I have nearly a TB of files in my content_repository, so I'd really like
> to
> be able to salvage this node, but I'm not sure how to proceed, as the
> node
> won't start up.
>
> --
> I know what it is to be in need, and I know what it is to have plenty.
> I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.
> I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>
>


-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message