nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Saile <da...@uni-koblenz.de>
Subject OPICScoringFilter always increasing a fetched site's score
Date Tue, 01 Feb 2011 15:15:57 GMT
Hi all,

I have a question concerning updating a site's score in Nutch 1.2.

In org.apache.nutch.crawlCrawlDbReducer's reduce-method I found a call to 
	scfilters.updateDbScore((Text)key, oldSet ? old : null, result, linkList);

During debugging, I discovered that this method is executed in the org.apache.nutch.scoring.opic.OPICScoringFilter
class.  The code for this method is the following:
	/** Increase the score by a sum of inlinked scores. */
  public void updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List inlinked) throws
ScoringFilterException {
    float adjust = 0.0f;
    for (int i = 0; i < inlinked.size(); i++) {
      CrawlDatum linked = (CrawlDatum)inlinked.get(i);
      adjust += linked.getScore();
    }
    if (old == null) old = datum;
    datum.setScore(old.getScore() + adjust);
  }

To my understanding, this code would increase a sites score based on it's inlinks, every time
a site is crawled. So even if neither the site has been modified, nor any new inlink was discovered,
the sites score will increase.

Is my understanding of this mechanism correct? 
If so, could anyone explain to me <why a sites score is increased in any case? I would
expect it to only change if either its content has changed, or a new inlink has been discovered.

Cheers
David




Mime
View raw message