nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Trivial Update of "RunningNutchAndSolr" by LewisJohnMcgibbney
Date Fri, 02 Sep 2011 19:21:50 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/RunningNutchAndSolr?action=diff&rev1=69&rev2=70

  }}} 
  
  This will include any url in the domain nutch.apache.org.
+ 
+ Now we are ready to initiate a crawl, use the following parameters:
+ 
+  * '''-dir''' ''dir'' names the directory to put the crawl in.
+  * '''-threads''' ''threads'' determines the number of threads that will fetch in parallel.
+  * '''-depth''' ''depth'' indicates the link depth from the root page that should be crawled.
+  * '''-topN''' ''N'' determines the maximum number of pages that will be retrieved at each
level up to the depth.
   * Run the following command:
  {{{
  bin/nutch crawl urls -dir crawl -depth 3 -topN 5

Mime
View raw message