manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Question about using ManifolfCF Repository Connectors
Date Tue, 15 Jul 2014 21:16:39 GMT
Hi Prasad,

Re: the scanOnly flag: Technically it is up to your connector to determine
how to use this flag.  It is set when the document has not changed from the
previous run.

The flag was originally added to help support chained models before
explicit CHAINED model choices were implemented in the framework.  For
chained models, discovery would not necessarily work correctly unless all
references could be rediscovered at all times.  In MCF 1.7, all of this
will be deprecated, and the getDocumentVersions() and processDocuments()
methods are in fact merged into one method, and an IProcessActivity method
is provided to check for differences from the previous indexing.

Hope this answers your question.


On Tue, Jul 15, 2014 at 5:06 PM, Paththamestrige Perera <> wrote:

> Hello All,
> I'm new to Apache ManifoldCF and I have spent sometime referring the
> publication 'ManifoldCF in Action' as well. I have started using the
> ManifoldCF system with the available repository connectors, CMIS Repository
> Connector, Alfresco Repository Connector and File System Connector.
> I have used them as continuous crawlers with specific re-crawl intervals.
> What I have noticed is that, irrelevant to the Document version (whether it
> has changed or not), in all re-crawl jobs, CMIS and Alfresco connectors
> process all seeded documents. I took a look at their implementations and as
> I could see, these repository connectors does not use the property
> 'scanOnly' at the processing time of seeded documents which hints if the
> document version has changed. It seems intentional by design. So I'm hoping
> to know why is it necessary to process all seeded documents (oppose to only
> process documents that were updated within the re-crawling interval) ?
> Thanks!
> Prasad Perera.

View raw message