nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ferdy Galema <ferdy.gal...@kalooga.com>
Subject Re: Adaptive generate.max.count
Date Fri, 04 Nov 2011 12:39:25 GMT
Hi Markus,

I was wondering what you exactly mean with dynamic. Is it different per 
fetch cycle but for all queues or do you mean a different value for 
different queues. (For example, when type is HOST, hostA will have a 
different generate max count than hostB).

Ferdy.

On 11/04/2011 12:32 AM, Markus Jelsma wrote:
> Hi,
>
> The generate.max.count defines the number of records per tpye of queue. We're
> looking for an improvement to make this setting dynamic. The new variable
> would be the number of total records for that type of queue (ip, host,
> domain).
>
> How can we adapt the generator for this? The problem is that there's no
> information on the number of records for a given URL.
>
> Any thoughts? Could we perhaps modify the updater to count the number of
> records for a queue and write it to the CrawlDatum without building a new
> updater tool based on the information provided by the current domainstatistics
> tool?
>
> Thanks

Mime
View raw message