nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject Re: "db.max.outlinks.per.page" is misunderstood?
Date Wed, 07 Sep 2005 17:28:23 GMT
Jack,
That is max outlinks per html page.
All your example pages have less than 100 outlinks, right?!
Stefan

Am 07.09.2005 um 18:43 schrieb Jack Tang:

> Hi All
>
> Here is the "db.max.outlinks.per.page" property and its description in
> nutch-default.xml
>     <property>
>       <name>db.max.outlinks.per.page</name>
>       <value>100</value>
>       <description>The maximum number of outlinks that we'll  
> process for a page.
>       </description>
>        </property>
>
> I don't think the description is right.
> Say, my crawler feeds are:
> http://www.a.com/index.php (90 outlinks)
> http://www.b.com/index.jsp  (80 outlinks)
> http://www.c.com/index.html (50 outlinks)
>
> and the number of crawler thread is 30. Do you think the reminder URLs
> ( (80 -10) outlinks + 50  outlinks) will be fetched?
>
> I think the description should be "The maximum number of outlinks in
> one fecthing phase."
>
>
> Regards
> /Jack
> -- 
> Keep Discovering ... ...
> http://www.jroller.com/page/jmars
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message