nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <>
Subject Re: Adaptive generate.max.count
Date Fri, 04 Nov 2011 12:39:25 GMT
Hi Markus,

I was wondering what you exactly mean with dynamic. Is it different per 
fetch cycle but for all queues or do you mean a different value for 
different queues. (For example, when type is HOST, hostA will have a 
different generate max count than hostB).


On 11/04/2011 12:32 AM, Markus Jelsma wrote:
> Hi,
> The generate.max.count defines the number of records per tpye of queue. We're
> looking for an improvement to make this setting dynamic. The new variable
> would be the number of total records for that type of queue (ip, host,
> domain).
> How can we adapt the generator for this? The problem is that there's no
> information on the number of records for a given URL.
> Any thoughts? Could we perhaps modify the updater to count the number of
> records for a queue and write it to the CrawlDatum without building a new
> updater tool based on the information provided by the current domainstatistics
> tool?
> Thanks

View raw message