manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Documents blocked sometimes without errors
Date Mon, 04 Jun 2018 09:33:10 GMT
Hi Maxence,

Pausing and restarting a job causes all of its documents to have their
docpriority field be recalculated.  It should not be necessary to do this
in order to have job complete, though.

All documents that are queued have their docpriority set at the time they
are added to the queue, but the docpriority they are given depends on how
many documents in the same document bin that have already been given
docpriority values.  This is done to make sure documents from all bins are
given an equal chance of being crawled.  But since documents are given a
docpriority when queued, there may well have been plenty of other documents
"in front" of them that are already queued and must be processed before
there's any chance of getting crawled.  So it is possible that documents
from one job may appear to block documents from another -- but this will
eventually correct itself and those documents will be crawled.

If you see *no* activity at all, however, then I wonder if somehow
documents have been queued with a null docpriority.  You can test this by
looking at the Document Status report and verifying that there is no reason
the documents should not be crawlable, and then looking in the database to
see what they have for their docpriority field.  Please let me know what
you find.

Thanks,
Karl




On Mon, Jun 4, 2018 at 4:20 AM msaunier <msaunier@citya.com> wrote:

> Hello Karl,
>
>
>
> Sometimes, jobs are blocked by many documents and I don’t know why because
> I don’t have errors. To unblock this, I paused and resume the job and it
> working. This is not always the case and they are never the same documents.
>
>
>
> We have a script at 8h55 PM and it’s possibly the reason of this error. We
> have create this script to avoid error, because SCO servers are reboot at
> 9h00 PM and ManifoldCF have an error if they servers are stopped.
>
>
>
> Script explanation:
>
>
>
> 1.       Call PAUSED for the current job at 8h55PM
>
> 2.       Call ManifoldCF stop and wait
>
> 3.       VACUUM FULL Postgres
>
> 4.       REINDEX Postgres
>
> 5.       (Wait 9h05 PM)
>
> 6.       Start ManifoldCF
>
> 7.       Wait ManifoldCF
>
> 8.       Resume job
>
>
>
> Do you have an idea to resolved this problem? It’s the REINDEX or the
> VACUUM FULL the problem?
>
>
>
> Thanks,
>
> Maxence
>
>
>

Mime
View raw message