manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Florian Schmedding" <>
Subject Re: Continuous crawling
Date Sun, 05 Jan 2014 14:08:33 GMT
Hi Karl,

yes, in our case it is necessary to make sure that new documents are
discovered and indexed within a certain interval. I have created a feature
request on that. In the meantime we will try to use a scheduled job

Thanks for your help,

> Hi Florian,
> What you are seeing is "dynamic crawling" behavior.  The time between
> refetches of a document is based on the history of fetches of that
> document.  The recrawl interval is the initial time between document
> fetches, but if a document does not change, the interval for the document
> increases according to a formula.
> I would need to look at the code to be able to give you the precise
> formula, but if you need a limit on the amount of time between document
> fetch attempts, I suggest you create a ticket and I will look into adding
> that as a feature.
> Thanks,
> Karl
> On Sat, Jan 4, 2014 at 7:56 AM, Florian Schmedding <
>> wrote:
>> Hello,
>> the parameters reseed interval and recrawl interval of a continuous
>> crawling job are not quite clear to me. The documentation tells that the
>> reseed interval is the time after which the seeds are checked again, and
>> the recrawl interval is the time after which a document is checked for
>> changes.
>> However, we observed that the recrawl interval for a document increases
>> after each check. On the other hand, the reseed interval seems to be set
>> up correctly in the database metadata about the seed documents. Yet the
>> web server does not receive requests at each time the interval elapses
>> but
>> only after several intervals have elapsed.
>> We are using a web connector. The web server does not tell the client to
>> cache the documents. Any help would be appreciated.
>> Best regards,
>> Florian

View raw message