nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <ku...@apache.org>
Subject Re: db.ignore.internal.links and ranking algorithms
Date Wed, 07 Nov 2007 20:57:50 GMT
Well, the short answer is it doesn't  Even if you set internal links to 
be ignored they are still calculated in the OPIC scoring and this 
negatively affects search relevancy.  The way to handle this is to set 
the db.score.link.internal variable to 0.0.  This way only external 
links are counted in OPIC.

I will post a wiki entry about this process soon.

Dennis Kubes

karthik085 wrote:
> Hi,
> 
> I was wondering how does db.ignore.internal.links affect rankings on
> PageRank and OPIC algiorithm?  I searched on the forum - couldn't get a
> clear-cut answer.
> 
> I am using Nutch 0.7.2 to crawl & index handful of sites. One site - has lot
> of pages and interlinks - around 1/3 of my total pages are from this site -
> hence, when I search for something and hit 'Show All Hits' - first 5-10
> pages are from this site - before any results from other sites are shown.
> How will db.ignore.internal.links help in this case?
> 
> Of course, I will have to recrawl with nutch-0.9 to use OPIC algorithm...:-(
> 
> Thanks.

Mime
View raw message