nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mina <>
Subject recrawl sites in nutch 1.3
Date Tue, 01 Nov 2011 12:04:58 GMT
hi, i want to re_crawl my sites every hour. i write a script for this. i edit
some properties in nutch-site.xml. but my re_crawler fetches urls only for 3
times an after that it stop fetching. it's mean that my nutch don't update
after 3 hours. this is my changes in nutch-site.xml:

  <description>The default number of seconds between re-fetches of a page
(30 days).</description> 
  <description>The implementation of fetch schedule. DefaultFetchSchedule
simply adds the original fetchInterval to the last fetch time, regardless of
page changes.</description> 
  <description>Defines the number of documents to send to Solr in a single
update batch. Decrease when handling very large documents to prevent Nutch
from running out of memory.</description> 
  <description>The maximum number of seconds between re-fetches of a page
(90 days). After this period every page in the db will be re-tried, no
matter what is its status.</description> 

View this message in context:
Sent from the Nutch - Dev mailing list archive at

View raw message