nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Tang <him...@gmail.com>
Subject Re: "db.max.outlinks.per.page" is misunderstood?
Date Wed, 07 Sep 2005 17:30:45 GMT
Yes, Stefan.
But it missed some URLs, and I set the value to 3000, then everything is OK

/Jack

On 9/8/05, Stefan Groschupf <sg@media-style.com> wrote:
> Jack,
> That is max outlinks per html page.
> All your example pages have less than 100 outlinks, right?!
> Stefan
> 
> Am 07.09.2005 um 18:43 schrieb Jack Tang:
> 
> > Hi All
> >
> > Here is the "db.max.outlinks.per.page" property and its description in
> > nutch-default.xml
> >     <property>
> >       <name>db.max.outlinks.per.page</name>
> >       <value>100</value>
> >       <description>The maximum number of outlinks that we'll
> > process for a page.
> >       </description>
> >        </property>
> >
> > I don't think the description is right.
> > Say, my crawler feeds are:
> > http://www.a.com/index.php (90 outlinks)
> > http://www.b.com/index.jsp  (80 outlinks)
> > http://www.c.com/index.html (50 outlinks)
> >
> > and the number of crawler thread is 30. Do you think the reminder URLs
> > ( (80 -10) outlinks + 50  outlinks) will be fetched?
> >
> > I think the description should be "The maximum number of outlinks in
> > one fecthing phase."
> >
> >
> > Regards
> > /Jack
> > --
> > Keep Discovering ... ...
> > http://www.jroller.com/page/jmars
> >
> >
> 
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
> 
> 
> 
> 


-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Mime
View raw message