nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Nutch Wiki] Update of "NutchTutorial" by riverma
Date Thu, 04 Sep 2014 00:07:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "NutchTutorial" page has been changed by riverma:

  Nutch developers have written one for you :), and it is available at [[bin/crawl]].
-      Usage: bin/crawl <seedDir> <crawlID> <solrURL> <numberOfRounds>
+      Usage: bin/crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds>
-      Example: bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/ 2
+      Example: bin/crawl urls/ TestCrawl/ http://localhost:8983/solr/ 2
-      Or you can use:
-      Example: bin/nutch crawl urls -solr http://localhost:8983/solr/ -depth 3 -topN 5
  The crawl script has lot of parameters set, and you can modify the parameters to your needs.
It would be ideal to understand the parameters before setting up big crawls.

View raw message