nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Nutch Wiki] Trivial Update of "RunningNutchAndSolr" by LewisJohnMcgibbney
Date Fri, 02 Sep 2011 19:21:50 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "RunningNutchAndSolr" page has been changed by LewisJohnMcgibbney:

  This will include any url in the domain
+ Now we are ready to initiate a crawl, use the following parameters:
+  * '''-dir''' ''dir'' names the directory to put the crawl in.
+  * '''-threads''' ''threads'' determines the number of threads that will fetch in parallel.
+  * '''-depth''' ''depth'' indicates the link depth from the root page that should be crawled.
+  * '''-topN''' ''N'' determines the maximum number of pages that will be retrieved at each
level up to the depth.
   * Run the following command:
  bin/nutch crawl urls -dir crawl -depth 3 -topN 5

View raw message