manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Question about using ManifolfCF Repository Connectors
Date Tue, 15 Jul 2014 21:41:57 GMT
Hi Prasad,

All changes to connector API's will be backwards compatible provided you
extend the base connector class.


On Tue, Jul 15, 2014 at 5:35 PM, Paththamestrige Perera <> wrote:

> Hello Karl,
> Thanks for the quick reply!
> I'm using MCF 1.6 and I haven't checked version 1.7 yet (I see it has a
> release date set to 31st of August).
> Regarding the API changes (I assume) you have mentioned in the second
> reply, will there be major changes for the output connector as well ? (for
> example, the interfaces addOrReplaceDocument & removeDocument will be
> altered as well ?). I have my own output connector, working with a
> customize indexing system and curious to know how things may change from
> 1.6 to 1.7.
> If it matters, I would be glad to create a ticket regarding the document
> version handling for repository connectors for the version 1.6 and would be
> happy to get those changes in to my project space.
> Thanks!
> Prasad Perera.
> On Tue, Jul 15, 2014 at 5:16 PM, Karl Wright <> wrote:
>> Hi Prasad,
>> Re: the scanOnly flag: Technically it is up to your connector to
>> determine how to use this flag.  It is set when the document has not
>> changed from the previous run.
>> The flag was originally added to help support chained models before
>> explicit CHAINED model choices were implemented in the framework.  For
>> chained models, discovery would not necessarily work correctly unless all
>> references could be rediscovered at all times.  In MCF 1.7, all of this
>> will be deprecated, and the getDocumentVersions() and processDocuments()
>> methods are in fact merged into one method, and an IProcessActivity method
>> is provided to check for differences from the previous indexing.
>> Hope this answers your question.
>> Karl
>> On Tue, Jul 15, 2014 at 5:06 PM, Paththamestrige Perera <
>>> wrote:
>>> Hello All,
>>> I'm new to Apache ManifoldCF and I have spent sometime referring the
>>> publication 'ManifoldCF in Action' as well. I have started using the
>>> ManifoldCF system with the available repository connectors, CMIS Repository
>>> Connector, Alfresco Repository Connector and File System Connector.
>>> I have used them as continuous crawlers with specific re-crawl
>>> intervals. What I have noticed is that, irrelevant to the Document version
>>> (whether it has changed or not), in all re-crawl jobs, CMIS and Alfresco
>>> connectors process all seeded documents. I took a look at their
>>> implementations and as I could see, these repository connectors does not
>>> use the property 'scanOnly' at the processing time of seeded documents
>>> which hints if the document version has changed. It seems intentional by
>>> design. So I'm hoping to know why is it necessary to process all seeded
>>> documents (oppose to only process documents that were updated within the
>>> re-crawling interval) ?
>>> Thanks!
>>> Prasad Perera.

View raw message