manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Error: Unexpected jobqueue status - record id X, expecting active status, saw 4 (MySQL compatible Database)
Date Fri, 07 Jun 2019 07:28:29 GMT
I just reviewed the places where JobQueue status is updated to be sure that
all places this happens are included in the diagnostics trackiing.  They
were, save for when a cluster gets restarted, and I added that to the code
in trunk as well.

So the analysis, above, stands.

:-(

Karl


On Thu, Jun 6, 2019 at 4:28 PM Karl Wright <daddywri@gmail.com> wrote:

> I just looked at this briefly.
>
> For
> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204,
> look for the following record ID:
>
> 1522147023170
>
> This is the one when we look at it after the document is processed that it has status
4:
>
> >>>>>>
>
> manifoldcf_1     | 2019-06-06T02:01:05.380401275Z 2019-06-06T04:01:05,380 DEBUG [Worker
thread '37'] org.apache.manifoldcf.diagnostics: ==== Forensics for record 1522147023170, current
status: 4 ====
>
> manifoldcf_1     | 2019-06-06T02:01:05.380512360Z 2019-06-06T04:01:05,380 DEBUG [Worker
thread '37'] org.apache.manifoldcf.diagnostics: === Current stack trace ===
> manifoldcf_1     | 2019-06-06T02:01:05.380544192Z java.lang.Exception: Forensics stack
trace
> manifoldcf_1     | 2019-06-06T02:01:05.380548494Z 	at org.apache.manifoldcf.crawler.jobs.TrackerClass.printForensics(TrackerClass.java:216)
[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.380552564Z 	at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1028)
[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.380568210Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3414)
[mcf-pull-agent.jar:?]
>
> <<<<<<
>
> It expects it to have status 5, though, because the only way the document could be being
worked on at all is if the document had been put into status 5 by the stuffer thread:
>
> >>>>>>
>
> manifoldcf_1     | 2019-06-06T02:01:05.387210733Z 2019-06-06T04:01:05,381 DEBUG [Worker
thread '37'] org.apache.manifoldcf.diagnostics: == About to commit transaction by 'Stuffer
thread' at 1559786464972 ==
> manifoldcf_1     | 2019-06-06T02:01:05.387250827Z java.lang.Exception: Precommit stack
trace
> manifoldcf_1     | 2019-06-06T02:01:05.387256464Z 	at org.apache.manifoldcf.crawler.jobs.TrackerClass.notePrecommit(TrackerClass.java:108)
~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387260987Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.fetchAndProcessDocuments(JobManager.java:3295)
~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387265034Z 	at org.apache.manifoldcf.crawler.jobs.JobManager.getNextDocuments(JobManager.java:3014)
~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387269073Z 	at org.apache.manifoldcf.crawler.system.StufferThread.run(StufferThread.java:186)
~[mcf-pull-agent.jar:?]
> manifoldcf_1     | 2019-06-06T02:01:05.387285663Z 2019-06-06T04:01:05,387 DEBUG [Worker
thread '37'] org.apache.manifoldcf.diagnostics:     Transaction includes:
> manifoldcf_1     | 2019-06-06T02:01:05.387290117Z 2019-06-06T04:01:05,387 DEBUG [Worker
thread '37'] org.apache.manifoldcf.diagnostics:       Record 1522147023170 status modified
to 5: Make active
>
> ...
>
> <<<<<<
>
> The actual commit follows a short while later to actually set the status, and only AFTER
that are the documents queued to be handed to the worker threads.
>
> So what this is saying is that the stuffer thread sets the document status to 5 and then
a worker thread gets it and when the worker thread is done it tries to set the status back
to 4 -- but it's ALREADY at 4 and that is not possible.
>
> Now, it may be worth examining the possibility that some other job gets to the document
while it is waiting to be queued, and does stuff to the document's jobqueue record.  There
are a couple of background threads which (for example) reprioritize documents that might conceivably
change the jobqueue record's status.  They generally work by changing the record status in
a transaction to some other value that would not make the record eligible for queuing, and
then changing it back when done.  But unless completely overlooked, these would, however,
be visible in the diagnostics, and furthermore it should not be possible for one thread in
a transaction to change the status at the same time as another thread in a transaction changes
the status.
>
> So either way I'm forced to conclude that a record gets changed and committed, but the
change is not visible to another thread later trying to reset the status.
>
> Karl
>
>
>
>
> On Thu, Jun 6, 2019 at 5:58 AM Karl Wright <daddywri@gmail.com> wrote:
>
>> I can look at the log output but not until this weekend.
>>
>> Karl
>>
>>
>> On Thu, Jun 6, 2019 at 3:59 AM Markus Schuch <
>> markus.schuch@deutschebahn.com> wrote:
>>
>>> Hi Olivier,
>>>
>>>
>>>
>>> we were not able to fix this yet.
>>>
>>>
>>>
>>> But now we have new diagnostics log data, after the error occurred again
>>> yesterday:
>>>
>>>
>>>
>>> Unexpected jobqueue status - record id 1522147023170, expecting active
>>> status, saw 4
>>>
>>>
>>> https://gist.githubusercontent.com/schuch/16f617eed064b004cb69a2f929b08ca8/raw/46b9585bf73bb7eaf5e3b28aeb87610525afc9ca/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201522147023170,%2520expecting%2520active%2520status,%2520saw%25204
>>>
>>>
>>>
>>> Unexpected jobqueue status - record id 1541529864711, expecting active
>>> status, saw 4
>>>
>>>
>>> https://gist.githubusercontent.com/schuch/d73672e44b0edd5a9184170c9ee417a5/raw/f55b1c023056ef8e310d99489a0b6f9dfcaaa585/Unexpected%2520jobqueue%2520status%2520-%2520record%2520id%25201541529864711,%2520expecting%2520active%2520status,%2520saw%25204
>>>
>>>
>>>
>>> @Karl, maybe you can get some ideas from that log data?
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Markus
>>>
>>>
>>>
>>> *Von:* Olivier Tavard <olivier.tavard@francelabs.com>
>>> *Gesendet:* Dienstag, 21. Mai 2019 11:00
>>> *An:* user@manifoldcf.apache.org
>>> *Betreff:* Re: Error: Unexpected jobqueue status - record id X,
>>> expecting active status, saw 4 (MySQL compatible Database)
>>>
>>>
>>>
>>> Hi Markus,
>>>
>>>
>>>
>>> We have the same error (with postgresql database). Did the error occur
>>> again since your last mail ?
>>>
>>> Did you change something on your MCF configuration to fix this ?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Best regards,
>>>
>>>
>>>
>>> Olivier
>>>
>>>
>>>
>>>
>>>
>>> Le 13 févr. 2019 à 13:58, Markus Schuch <markus.schuch@deutschebahn.com>
>>> a écrit :
>>>
>>>
>>>
>>> Hi Karl,
>>>
>>>
>>>
>>> we set the diagnostigs logger to level debug.
>>>
>>>
>>>
>>> I will get back when the error occurs again.
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Markus
>>>
>>>
>>> ------------------------------
>>>
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie hier:
>>>  http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>>
>>>
>>> Le 12 févr. 2019 à 17:41, Karl Wright <daddywri@gmail.com> a écrit :
>>>
>>>
>>>
>>> Hi Marcus,
>>>
>>>
>>>
>>> There's a properties.xml debugging logger you can enable that will keep
>>> track of what's happening with transactions, so that when an error of this
>>> kinds is reported, information about why the situation is unexpected is
>>> dumped to the log.  The logger is called "diagnostics" e.g.
>>> "org.apache.manifoldcf.diagnostics".
>>>
>>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Feb 12, 2019 at 10:53 AM Markus Schuch <
>>> markus.schuch@deutschebahn.com> wrote:
>>>
>>> Hi,
>>>
>>> we are seeing "Error: Unexpected jobqueue status - record id
>>> 1484612513829, expecting active status, saw 4" from time to time.
>>> I didn’t report it, because we were running on an old MCF version, and
>>> there were some bugreports relating to this error that are resolved in
>>> newer versions.
>>>
>>> Now we have upgraded to latest and greatest and the error still
>>> occurred. So want to start to track this down.
>>>
>>> Out setup is:
>>>
>>>    - ManifoldCF 2.12, running in a Docker Container based on Redhat
>>>    Linux, OpenJDK 8
>>>    - AWS RDS Database (Aurora MySQL -> 5.6 compatible)
>>>    - Single Process Setup
>>>
>>>
>>> We run several Jobs (run once mode) over night with schedule windows and
>>> max runtime settings. Some of the time windows are overlapping.
>>> At normal days some Jobs finish during their time window, some go to
>>> WAITING and will go on in the next night.
>>>
>>> The error mostly hits at a Sharepoint Repository Job.
>>> We have the impression, that the error is somehow related to situations,
>>> when service interruptions (e.g. connection issues) occur in other jobs.
>>>
>>>
>>>
>>> Maybe all this is related to fundamental problems with the database or
>>> the JDBC driver
>>>
>>> (Karl expressed his concerns about that in the maybe-related
>>> https://issues.apache.org/jira/browse/CONNECTORS-1581 ticket)
>>>
>>> On the other hand, there were similar bugreports in the past concerning
>>> unexpected jobqueue status and there were resolved.
>>>
>>> Mayby there is a chance this can also be analyzed and fixed, but I do
>>> not really know where to start.
>>>
>>>
>>> Has anybody ideas how we can track this down / debug this to get to the
>>> bottom of the problem?
>>> What could be the first step in the analysis?
>>>
>>> Many thanks in advance
>>> Markus
>>>
>>> May be related bugreports:
>>>
>>>
>>>
>>> [0] https://issues.apache.org/jira/browse/CONNECTORS-1395
>>> [1] https://issues.apache.org/jira/browse/CONNECTORS-1180
>>> [2] https://issues.apache.org/jira/browse/CONNECTORS-590
>>> [3] https://issues.apache.org/jira/browse/CONNECTORS-246
>>>
>>>
>>>
>>> --
>>>
>>> Markus Schuch
>>>
>>> Web Business (T.IPB 26)
>>>
>>>
>>>
>>> DB Systel GmbH
>>>
>>> Jürgen-Ponto-Platz 1, 60329 Frankfurt a. Main
>>>
>>>
>>> ------------------------------
>>>
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190211>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> Pflichtangaben anzeigen
>>> <http://www.deutschebahn.com/pflichtangaben/20190529>
>>>
>>> Nähere Informationen zur Datenverarbeitung im DB-Konzern finden Sie
>>> hier: http://www.deutschebahn.com/de/konzern/datenschutz
>>>
>>

Mime
View raw message