nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniele Menozzi <me...@ngi.it>
Subject Problems on Crawling
Date Fri, 16 Sep 2005 16:50:11 GMT
Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do
not have really understood what is the ralationship between
depth,segments,fetching..
Take for example the tutorial, I understand theese 2 steps:

	bin/nutch admin db -create
	bin/nutch inject db -dmozfile content.rdf.u8 -subset 3000

but, when I do this:
	
	bin/nutch generate db segments

what happens? I think that a dir called 'segments' id created, and inside
of it I can find the links I have previously injected.Ok.Next steps:
	
	bin/nutch fetch $s1 	
	bin/nutch updatedb db $s1 

Ok, no problems here. 
But now I cannot understood what happens with this command:

	bin/nutch generate db segments

it is the same command of above, but now I've not injected anything in the
DB, it only contais the pages I've previously fetched.
So, does it mean that when I generate a segment, it will automagically be
filled with links found in fetched pages? And where theese links are saved?
And who saves theese link?

Thank you so much, this work is really interesting!
	Menoz

-- 
		      Free Software Enthusiast
		 Debian Powered Linux User #332564 
		     http://menoz.homelinux.org

Mime
View raw message