nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doğacan Güney" <doga...@gmail.com>
Subject Re: Creating a new scoring filter.
Date Tue, 27 Feb 2007 16:12:53 GMT
Hi,

On 2/27/07, Nicolás Lichtmaier <nick@reloco.com.ar> wrote:

[snip]

>
> It doesn't seem a good way to do it. What if there are no outlinks? This
> method won't be called at all. And anyway, it would be called once per
> each outlink, which would multiplicate the work.

Multiplication is easy to solve but you are right that it won't work
if there are no outlinks.

Maybe scoring filter api should change? A distributeScoreToOutlinks
method may be more useful than the current one: (which will be called
even if there are no outlinks)

CrawlDatum distributeScoreToOutlinks(Text fromUrl, List<String>
toUrlList,   List<CrawlDatum> datumList, ParseData parseData,
CrawlDatum adjust)

This method gives more control to the plugin since knowing all the
outlinks the plugin can make more informed decisions. Like, right now,
there is no way a scoring filter can be sure that it has distributed
all its cash (e.g if db.score.internal.link is 0.5 and
db.score.external.link is 1.0, filter will almost always distribute
less than its cash).

This will also work for your case, since you will just ignore the
outlinks and return the adjust datum based on information in parse
metadata.

What do you (and others) think?

>
> Thanks!
>
>


-- 
Doğacan Güney
Mime
View raw message