manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Manifoldcf - Job Deletion Process
Date Tue, 29 Oct 2019 10:10:31 GMT
ManifoldCF is an incremental crawler, which means that on every
(non-continuous) job run it sees which documents it can find and removes
the ones it can't.  The history for the documents being deleted should tell
you why they are being deleted -- it may be that (a) they weren't found, or
(b) that the document specification in the job changed and they are no
longer included in the job.

Karl


On Tue, Oct 29, 2019 at 5:30 AM Priya Arora <priya@smartshore.nl> wrote:

> Hi All,
>
> I have a query regarding ManifoldCF Job process.I have a job to crawl
> intranet site
> Repository Type:- Web
> Output Connector Type:- Elastic search.
>
> Job have to crawl around4-5 lakhs of total records. I have discarded the
> previous index and created a new index(in Elasticsearch) with proper
> mappings and settings and started the job again after cleaning Database
> even(Database used a PostgreSQL).
> But while the job continues its ingests the records properly but just
> before finishing (some times in between also), it initiates the process of
> Deletions and also it does not index the deleted documents again in index.
>
> Can you please something if I am doing anything wrong? or is this a
> process of manifoldcf if yes , why its not getting ingested again.
>
> Thanks and regards
> Priya
>
>

Mime
View raw message