nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <>
Subject Re: A FetchSchedule bug makes fetch time becoming more and more big
Date Wed, 15 Aug 2012 12:24:21 GMT

Yeah this is something I noticed too some while ago. Although it does not
directly break the crawling directly, it is not a nice implementation.
Notice that the Generator tries to correct for fetchtime too far off in the
future. (In the AbstractFetchSchedule shouldFetch method.)

As a matter of fact I have refactored the updating process slightly to only
update the fetchtime once. (Directly after a fetch that is). The best part
is that this change allows for running several generate-fetch cycles
without running the updater every time. There is a slight downside but I
will post it in the issue after I have attached a patch for this


On Wed, Aug 15, 2012 at 2:11 PM, lin weijian <> wrote:

> Hi,
> When DbUpdateReducer executes, it will call setFetchSchedule for a
> fetched page. This function will
> add fetch interval to the new fetch time, no matter if it has been added
> up. It makes the fetch time becoming more and more big.    It's should add
> fetch interval to last fetch time.
>     Thanks.

View raw message