nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Nutch Wiki] Update of "WhiteListRobots" by ChrisMattmann
Date Sat, 18 Apr 2015 17:35:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "WhiteListRobots" page has been changed by ChrisMattmann:

  == Build the Nutch runtime and execute RobotRulesParser ==
  Now, build the Nutch runtime, e.g., by running ```ant runtime```.
- From your ```runtime/local/```` directory, run this command:
+ From your nutch SVN or git checkout top-level directory, run this command:
  java -cp build/apache-nutch-1.10-SNAPSHOT.job:build/apache-nutch-1.10-SNAPSHOT.jar:runtime/local/lib/hadoop-core-1.2.0.jar:runtime/local/lib/crawler-commons-0.5.jar:runtime/local/lib/slf4j-log4j12-1.6.1.jar:runtime/local/lib/slf4j-api-1.7.9.jar:runtime/local/lib/log4j-1.2.15.jar:runtime/local/lib/guava-11.0.2.jar:runtime/local/lib/commons-logging-1.2.jar:runtime/local/lib/commons-cli-1.2.jar
org.apache.nutch.protocol.RobotRulesParser robots.txt urls Nutch-crawler

View raw message