nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Updated: (NUTCH-61) Adaptive re-fetch interval. Detecting umodified content
Date Sun, 05 Jun 2005 22:00:41 GMT
     [ ]

Andrzej Bialecki  updated NUTCH-61:

    Attachment: 20050606.diff

The first round:

* change Page to use a 1-byte float, representing fetchInterval in seconds.

* implement a pluggable FetchSchedule, which adjusts fetchInterval and nextFetchTime

* change FetchListTool and UpdateDatabaseTool to use them. NOTE: it appears there was a bug
in FetchListTool, where the fetchlist entries recorded in segments would have their fetchTime
increased by 1 week. This is not needed, only pages in WebDB need this.

* improve status reporting throughout all plugins.

* change plugins to detect if the content is unchanged. If possible, plugins will not fetch
such content, but in any case they will set their status accordingly.

> Adaptive re-fetch interval. Detecting umodified content
> -------------------------------------------------------
>          Key: NUTCH-61
>          URL:
>      Project: Nutch
>         Type: New Feature
>   Components: fetcher
>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>  Attachments: 20050606.diff
> Currently Nutch doesn't adjust automatically its re-fetch period, no matter if individual
pages change seldom or frequently. The goal of these changes is to extend the current codebase
to support various possible adjustments to re-fetch times and intervals, and specifically
a re-fetch schedule which tries to adapt the period between consecutive fetches to the period
of content changes.
> Also, these patches implement checking if the content has changed since last fetching;
protocol plugins are also changed to make use of this information, so that if content is unmodified
it doesn't have to be fetched and processed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message