nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "Tutorial on incremental crawling" by Gabriele Kahlout
Date Sun, 27 Mar 2011 12:56:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "Tutorial on incremental crawling" page has been changed by Gabriele Kahlout.
http://wiki.apache.org/nutch/Tutorial%20on%20incremental%20crawling?action=diff&rev1=6&rev2=7

--------------------------------------------------

  # 1. $ mv whole-web-crawling-incremental $NUTCH_HOME/whole-web-crawling-incremental
  # 2. $ cd $NUTCH_HOME
  # 3. $ chmod +x whole-web-crawling-incremental
- # 4. $ ./whole-web-crawling-incremental
+ # 4. $ ./whole-web-crawling-incremental seeds 5 2
  
- # Usage: ./whole-web-crawling-incremental [it_seedsDir-path urls-to-fetch-per-iteration
depth]
+ # Usage: ./whole-web-crawling-incremental it_seedsDir-path urls-to-fetch-per-iteration depth
  # Start
  
  rm -r crawl # fresh crawl

Mime
View raw message