nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Kubes (JIRA)" <>
Subject [jira] Commented: (NUTCH-635) LinkAnalysis Tool for Nutch
Date Sun, 15 Jun 2008 16:10:45 GMT


Dennis Kubes commented on NUTCH-635:

Andrzej Bialecki wrote:

> One more question: you said the algorithm converges, but do you have a reference set
of values from this dataset, calculated using some other pagerank impl? It would be worthwhile
to make sure that the > > values are indeed the PageRank, as described, and not yet
another subtle variation such as our OPIC

I was doing it low tech.  By turning on the debug logging, warning it is a large output, and
using grep you can see the score converge after a few iterations ;)

> There are a few Java packages for computing PageRank, we could adapt one of those to
serve as a baseline:

I agree it would be a good comparison.  Strictly speaking though it is not just pagerank.
 There are optimizations for multiple links from a given domain, penalties for very few inlinks,
and a minimum score value.  All of which are able to be changed through the configuration.
 Besides that it does follow the original pagerank algorithm closely.

> LinkAnalysis Tool for Nutch
> ---------------------------
>                 Key: NUTCH-635
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.0.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 1.0.0
>         Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch, NUTCH-635-3-20080614.patch,
> This is a basic pagerank type link analysis tool for nutch which simulates a sparse matrix
using inlinks and outlinks and converges after a given number of iterations.  This tool is
mean to replace the current scoring system in nutch with a system that converges instead of
exponentially increasing scores.  Also includes a tool to create an outlinkdb.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message