nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Orkunt Sabuncu <>
Subject Fwd: links in db and pagerank calculation
Date Tue, 12 Jul 2005 11:43:19 GMT

I found a setting that solves my first problem. Setting 
db.ignore.internal.links to false will generate all the links in a web site.

Still I couldn't find any clue about the second one. Why nutch page anaysis 
module compute contributionForOutlinkers? There is nothing like this in the 
usual PageRank algorithm. Any idea about this? I am forwading the first mail 
sent to nutch-user.

Thanks in advance,

----------  Forwarded Message  ----------

Subject: links in db and pagerank calculation
Date: Monday 11 July 2005 11:17
From: Orkunt Sabuncu <>


Let's say we have a site with diamond like link structure. There are 4 pages
 r (root), 1, 2, and 3. r has outlinks to 1 and 2; and both 1 and 2 have
 outlinks to 3. When we crawl this site, the links in webdb ignores the link
 from 2 to 3. At the end there are only 3 links in db. 2 from r pointing to 1
 and 2; one from 1 to 3.

This will surely effects PageRank calculations. Is this a bug or am i
considering something wrong?

Also, in the link analysis module ( there are
some extra score contributions named contributionForOutlinkers. This
contribution considers the links to pages which have also links to other
pages. I couldn't find references to this way of calculating pagerank in the
literature. Basic pagerank calculation considers only the outlinks. Nutch's
way of calculation will find different scores from the basic Pagerank
calculation. So, what's the use of contribution for outlinkers? Do you have
any idea or references that explains this?

I am using Nutch-0.6



View raw message