nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Trivial Update of "bin/crawl" by LewisJohnMcgibbney
Date Sat, 13 Jun 2015 18:26:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/crawl" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/bin/crawl?action=diff&rev1=1&rev2=2

- The bin/crawl script gives more command during a crawl. Instead of using org.apache.nutch.crawl.Crawl
class, it uses individual steps (inject->generate->fetch->parse->updatedb) during
a crawl. It is recommended to use this instead of using the [[bin/nutch crawl]] command. 
+ = Description =
+ The bin/crawl script gives more command during a crawl. It uses individual steps (inject->generate->fetch->parse->updatedb)
during a crawl. 
  
+ = Usage =
+ == Nutch 1.X ==
+ {{{
+      Usage: crawl [-i|--index] [-D "key=value"] <Seed Dir> <Crawl Dir> <Num
Rounds>
+         -i|--index      Indexes crawl results into a configured indexer
+         -D              A Java property to pass to Nutch calls
+         Seed Dir        Directory in which to look for a seeds file
+         Crawl Dir       Directory where the crawl/link/segments dirs are saved
+         Num Rounds      The number of rounds to run this crawl for
+      Example: bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ urls/ TestCrawl/
 2
+ }}}
+ 
+ == Nutch 2.x ==
+ 
+ = Need Assistance ? =
  
  Please message us in the [[http://nutch.apache.org/mailing_lists.html|user-mailing list]]
if you find any issues
  

Mime
View raw message