nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Tang <him...@gmail.com>
Subject "db.max.outlinks.per.page" is misunderstood?
Date Wed, 07 Sep 2005 16:43:24 GMT
Hi All

Here is the "db.max.outlinks.per.page" property and its description in
nutch-default.xml
	<property>
	  <name>db.max.outlinks.per.page</name>
	  <value>100</value>
	  <description>The maximum number of outlinks that we'll process for a page.
	  </description>
       </property>

I don't think the description is right.
Say, my crawler feeds are:
http://www.a.com/index.php (90 outlinks)
http://www.b.com/index.jsp  (80 outlinks)
http://www.c.com/index.html (50 outlinks)

and the number of crawler thread is 30. Do you think the reminder URLs
( (80 -10) outlinks + 50  outlinks) will be fetched?

I think the description should be "The maximum number of outlinks in
one fecthing phase."


Regards
/Jack
-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Mime
View raw message