nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel D." <nutchfo...@gmail.com>
Subject Re: Seeking help in understanding – fetch, refetch & co.
Date Fri, 10 Jun 2005 03:52:24 GMT
Hi Andrzej,**

* ***

I was looking in the wrong place. I have modified code to ignore 
fetchinterval value coming from the fetchlist. I didn't realize until now 
that URLS (that are not due) are not being included in the fetchlist. It's 
very easy to follow and understand code in the FetchListTool.emitFetchList() 
and now it's clear to me. Thanks for your help.

 One more question:

 Where can I found information regarding the memory (dick) usage for the 
WebDB and CPU usage
for bin/nutch updatedb? I'm looking for something like: for 1,000,000
documents WebDB will take approximately XX GB and running bin/nutch
updatedb on 1,000,000 will use up to XX MB of RAM. 

 Thanks,

Daniel

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message