nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steffen Viken ValvÄg <steff...@cs.uit.no>
Subject Whole-web crawling with the mapreduce branch
Date Thu, 15 Sep 2005 09:47:06 GMT
Hi,

I'm playing around with the mapreduce branch, and got it working for a
simple intranet crawl by following the nutch tutorial on
http://lucene.apache.org/nutch/tutorial.html.  The tutorial seems
inapplicable when it comes to whole-web crawling, though, as the "nutch
admin" command has been disabled, and the usage of the "nutch inject"
command seems to have changed.  I'm willing to read the source to get up to
speed, but if there is any other documentation on the mapreduce branch that
would obviously be helpful.  I would also greatly appreciate it if someone
took the time to give me a short bullet list of commands to get me started
on a whole-web crawl.

Thanks,
Steffen


Mime
View raw message