manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF + Postgresql - long freeze on job
Date Mon, 11 Feb 2019 13:49:21 GMT
There is not such a specific value.  But you can practically disable this
entirely by setting a very large value, e.g. 2000000000.

Karl

On Mon, Feb 11, 2019 at 7:43 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET <
Daniel.Lirot@developpement-durable.gouv.fr> wrote:

> Hi,
>
> We see the table "Advanced properties.xml properties", we use it to
> parametrized :
>   "<property
> name="org.apache.manifoldcf.db.postgres.reindex.intrinsiclink"
> value="5000000" />" for the intrinsiclink table, and we do the same for the
> other tables,
> but is there a value  that allows to disable the reindex and the analyze,
> for example "-1" or "0", i didn't find it in the documentation.
>
> Thank you
>
>
> Le 11/02/2019 à 12:26, > Karl Wright (par Internet, dépôt
> user-return-5690-daniel.lirot=developpement-durable.gouv.fr@manifoldcf.apache.org)
> a écrit :
>
> See:
> https://manifoldcf.apache.org/release/release-1.10/en_US/how-to-build-and-deploy.html#file+properties
>
> Look at the table "Advanced properties.xml properties"
>
> Karl
>
>
> On Mon, Feb 11, 2019 at 4:16 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET <
> Daniel.Lirot@developpement-durable.gouv.fr> wrote:
>
>> Hello,
>>
>> 1/ The database we use is Postgresql version 9.6
>>
>> 2/ I will look at what is happening about the queries in the logs.
>>
>> 3/ We do a vacuum full analyse every 24 hours, for each table we adjust
>> the reindex at the value 5000000 (in properties.xml) with the line :
>>  <property name="org.apache.manifoldcf.db.postgres.reindex.intrinsiclink"
>> value="5000000" />
>>
>> Is there an instruction that allows to disable the reindex requested by
>> manifoldcf
>>
>> thanks
>>
>> Daniel
>>
>>
>> Le 08/02/2019 à 16:00, > Karl Wright (par Internet, dépôt
>> user-return-5674-daniel.lirot=developpement-durable.gouv.fr@manifoldcf.apache.org)
>> a écrit :
>>
>> Hello,
>>
>> (1) What database are you using for this?  Some databases require
>> maintenance periodically or have other heavy usage constraints.
>> (2) Every time a query takes more than an minute to execute, it is
>> logged, along with the query plan.  You need to look at the manifoldcf log
>> to see which queries are problematic before concluding anything.
>> (3) For every database table, you can individually configure how many
>> table operations approximately occur before MCF re-analyzes the table.
>> However, it's likely that you have the opposite problem: a bad query plan
>> for the query that queues documents for processing.  That may mean more
>> frequent analysis to prevent.  But we cannot tell that until we understand
>> what queries are taking a long time.
>>
>> Thanks,
>> Karl
>>
>>
>>
>> On Fri, Feb 8, 2019 at 8:07 AM LIROT Daniel - SG/SPSSI/CPII/DOSO/ET <
>> Daniel.Lirot@developpement-durable.gouv.fr> wrote:
>>
>>> Hello,
>>>
>>> We use ManifoldCF v2.10, with postgresql (9.6) to crawl our websites.
>>> this represents approximately 1.2 million documents.
>>> We split the crawl into 4 jobs that distribute their results on 3 SOLR
>>> collections.
>>> The crawl is powerful up to 500000 documents (25000 to 30000 docs /
>>> hour) then the performance decreases strongly in progress, we observe
>>> freezes very very long, you might think that the crawl is stopped.
>>> We suspect a reindexing, noticeably of the intrinsiclink table which is
>>> very important 85 Million lines.
>>> Is it possible to prohibit re-indexing controlled by manifoldCF?
>>> An other idea ?
>>>
>>> best Regards
>>> LIROT daniel
>>> --
>>>
>>
>>
>

Mime
View raw message