manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF + Postgresql - long freeze on job
Date Fri, 08 Feb 2019 15:00:28 GMT
Hello,

(1) What database are you using for this?  Some databases require
maintenance periodically or have other heavy usage constraints.
(2) Every time a query takes more than an minute to execute, it is logged,
along with the query plan.  You need to look at the manifoldcf log to see
which queries are problematic before concluding anything.
(3) For every database table, you can individually configure how many table
operations approximately occur before MCF re-analyzes the table.  However,
it's likely that you have the opposite problem: a bad query plan for the
query that queues documents for processing.  That may mean more frequent
analysis to prevent.  But we cannot tell that until we understand what
queries are taking a long time.

Thanks,
Karl



On Fri, Feb 8, 2019 at 8:07 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET <
Daniel.Lirot@developpement-durable.gouv.fr> wrote:

> Hello,
>
> We use ManifoldCF v2.10, with postgresql (9.6) to crawl our websites.
> this represents approximately 1.2 million documents.
> We split the crawl into 4 jobs that distribute their results on 3 SOLR
> collections.
> The crawl is powerful up to 500000 documents (25000 to 30000 docs / hour)
> then the performance decreases strongly in progress, we observe freezes
> very very long, you might think that the crawl is stopped.
> We suspect a reindexing, noticeably of the intrinsiclink table which is
> very important 85 Million lines.
> Is it possible to prohibit re-indexing controlled by manifoldCF?
> An other idea ?
>
> best Regards
> LIROT daniel
> --
>

Mime
View raw message