manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: "start minimal" option even deletes contents whose links are deleted
Date Wed, 24 Dec 2014 04:59:45 GMT
Hi Shigeki,

Minimal crawls do not guarantee that there is no document deletion.  Such
crawls only do the least amount of work possible based on what model the
underlying connector implements.  This often just means not doing the
"cleanup" phase at the end of the job run, which typically removes
no-longer-reachable documents.  But if, for instance, you are using the web
connector and you have hop count filtering enabled, then the framework will
keep track of hop count and will remove all documents that exceed it, which
does not require the end-of-job cleanup phase.

If your goal is to avoid removing any previously crawled documents, then I
am afraid that MCF does not have any real support for your model.  "Start
minimal" is certainly not going to help you.

Thanks,
karl

Mime
View raw message