nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arnaud G <greatpat...@gmail.com>
Subject Re: Queue incoherent state
Date Tue, 16 May 2017 14:46:33 GMT
Hi again,

Unfortunately I didn't save the logs when the node restarted, but I don't
remember anything that provided me a clue regarding the reason of the
blocked queue.

I just have a few logs during the week-end when the queues were in this
strange state:

2017-05-14 09:01:29,635 INFO [pool-12-thread-1] org.wali.
MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@7bdc0ad3
checkpointed with 85016 Records and 18 Swap Files in 5447 milliseconds
(Stop-the-world time = 37 milliseconds, Clear Edit Logs time = 30 millis),
max Transaction ID 307183065

2017-05-14 09:04:48,056 INFO [Write-Ahead Local State Provider Maintenance]
org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@265c0752
checkpointed with 2 Records and 0 Swap Files in 22 milliseconds
(Stop-the-world time = 1 milliseconds, Clear Edit Logs time = 1 millis),
max Transaction ID 7

2017-05-14 09:05:37,737 INFO [pool-12-thread-1] org.wali.
MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@7bdc0ad3
checkpointed with 85016 Records and 18 Swap Files in 4677 milliseconds
(Stop-the-world time = 35 milliseconds, Clear Edit Logs time = 17 millis),
max Transaction ID 307183065

2017-05-14 09:11:50,435 INFO [pool-12-thread-1] o.a.n.c.r.
WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository
with 85016 records in 4057

milliseconds


This log is reporting 85K records that were not available if I requested
the queue status (the queue was always empty) and overall the queue were
reporting more elements (over 100K)



As we can see this records were stuck as 2 hours later they were still
there, and other records were flowing nicely during the week-end in the
cluster.


2017-05-14 11:01:12,839 INFO [pool-12-thread-1] o.a.n.c.r.
WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository
with 85016 records in 3408 milliseconds


Regarding disk space, I don't think that it ran out of space at any moment.
I even have a local backup of the flowfile directory that I did before
emptying it.


I hope it helps.


Arnaud






On Tue, May 16, 2017 at 3:51 PM, Mark Payne <markap14@hotmail.com> wrote:

> Arnaud,
>
> Did you have any WARN or ERROR messages in the logs? I'm particular
> interested in anything
> that mentions the word "Swap" or "swap" (i.e., regardless of case). Is it
> possible that the FlowFile Repository
> could have run out of disk space?
>
> Thanks
> -Mark
>
> On May 16, 2017, at 3:34 AM, Arnaud G <greatpatton@gmail.com> wrote:
>
> Hi Matt,
>
> Thanks for your reply!
>
> I finally solved the problem by deleting all the content in the flowfile
> directory, but here are my observations:
>
> 1) The problem was coming from one of the cluster node, when this node was
> out of the cluster, the queue were reporting 0 flowfile.
> 2) The first time I restarted this node, about 20'000 flowfile reappeared
> and were treated, every time I subsequently restarted this node about
> 20-30k flowfiles were again treated (I was only specifically monitoring one
> queue, but it happened for multiple other queues)
> 3) After 3-4 reboots of this node the queue reported 90K elements and
> remained in this state despite multiple other restart.
> 4) The flowfile directory on this node contained 200 MB of data
> 5) I tried to setup the flowfile expiration but it didn't do anything to
> the queue status
> 6) I tried to change the backpressure threshold without any effect.
> 7) During the problem the queue was operating normally on the cluster, and
> flowfiles were flowing through it without any issue.
>
> Arnaud
>
>
>
> On Mon, May 15, 2017 at 10:39 PM, Matt Gilman <matt.c.gilman@gmail.com>
> wrote:
>
>> Sorry for the delayed response. Similar behavior has been reported by
>> some other users [1]. Does the connection have any back pressure threshold
>> configured? Can new flowfiles be enqueued? Do the expiration settings have
>> any affect?
>>
>> Lastly, if you restart the cluster does it claim the connection still has
>> flowfiles enqueued?
>>
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-3897
>>
>> On Fri, May 12, 2017 at 5:47 AM, Arnaud G <greatpatton@gmail.com> wrote:
>>
>>> Hi again!
>>>
>>> I currently have  another issue with incoherent queue status.
>>>
>>> Following the upgrade to 1.2 of a cluster, I have a couple of queues
>>> that display through the GUI a high number of flowfiles.
>>>
>>> As the queue were no emptying despite tuning, I tried to list the
>>> content of the queue. This action returns that the queue contains no
>>> flowfile, which is not the expected as the GUI displays another value.
>>>
>>> If I try to empty the queue, I receive a message: 0 FlowFiles (0 bytes)
>>> out of 210'000 (92.71MB) were removed from the queue.
>>>
>>> And of course I cannot delete the queue as this action reports me that
>>> the queue is not empty.
>>>
>>> So somehow it seems that the queue are empty but that the current
>>> display of the queue don't reflect it (it is very likely that some data
>>> were lost during the upgrade procedure as we had to reboot a few node to
>>> change the heap property)
>>>
>>> What will be the best method to restore a proper state and be able to
>>> edit the flow file again?
>>>
>>> Thank you!
>>>
>>> Arnaud
>>>
>>>
>>>
>>
>
>

Mime
View raw message