manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF + Postgresql - long freeze on job
Date Mon, 11 Feb 2019 11:26:43 GMT
See:
https://manifoldcf.apache.org/release/release-1.10/en_US/how-to-build-and-deploy.html#file+properties

Look at the table "Advanced properties.xml properties"

Karl


On Mon, Feb 11, 2019 at 4:16 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET <
Daniel.Lirot@developpement-durable.gouv.fr> wrote:

> Hello,
>
> 1/ The database we use is Postgresql version 9.6
>
> 2/ I will look at what is happening about the queries in the logs.
>
> 3/ We do a vacuum full analyse every 24 hours, for each table we adjust
> the reindex at the value 5000000 (in properties.xml) with the line :
>  <property name="org.apache.manifoldcf.db.postgres.reindex.intrinsiclink"
> value="5000000" />
>
> Is there an instruction that allows to disable the reindex requested by
> manifoldcf
>
> thanks
>
> Daniel
>
>
> Le 08/02/2019 à 16:00, > Karl Wright (par Internet, dépôt
> user-return-5674-daniel.lirot=developpement-durable.gouv.fr@manifoldcf.apache.org)
> a écrit :
>
> Hello,
>
> (1) What database are you using for this?  Some databases require
> maintenance periodically or have other heavy usage constraints.
> (2) Every time a query takes more than an minute to execute, it is logged,
> along with the query plan.  You need to look at the manifoldcf log to see
> which queries are problematic before concluding anything.
> (3) For every database table, you can individually configure how many
> table operations approximately occur before MCF re-analyzes the table.
> However, it's likely that you have the opposite problem: a bad query plan
> for the query that queues documents for processing.  That may mean more
> frequent analysis to prevent.  But we cannot tell that until we understand
> what queries are taking a long time.
>
> Thanks,
> Karl
>
>
>
> On Fri, Feb 8, 2019 at 8:07 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET <
> Daniel.Lirot@developpement-durable.gouv.fr> wrote:
>
>> Hello,
>>
>> We use ManifoldCF v2.10, with postgresql (9.6) to crawl our websites.
>> this represents approximately 1.2 million documents.
>> We split the crawl into 4 jobs that distribute their results on 3 SOLR
>> collections.
>> The crawl is powerful up to 500000 documents (25000 to 30000 docs / hour)
>> then the performance decreases strongly in progress, we observe freezes
>> very very long, you might think that the crawl is stopped.
>> We suspect a reindexing, noticeably of the intrinsiclink table which is
>> very important 85 Million lines.
>> Is it possible to prohibit re-indexing controlled by manifoldCF?
>> An other idea ?
>>
>> best Regards
>> LIROT daniel
>> --
>>
>
>

Mime
View raw message