nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2039) Relevance based scoring filter
Date Thu, 18 Jun 2015 05:02:00 GMT


Lewis John McGibbney commented on NUTCH-2039:

The most recent PR does not accommodate the change in package naming within the package imports.

I've got a local dirty copy of this patch which I've implemented the changes, I'll commit
it to trunk right now as the patch is good and has been improved inline with everyone's code
Excellent work [~sujenshah]

> Relevance based scoring filter
> ------------------------------
>                 Key: NUTCH-2039
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Sujen Shah
>            Assignee: Lewis John McGibbney
>              Labels: memex, nutch
>             Fix For: 1.11
> A ScoringFilter plugin that uses a similarity measure to calculate the similarity between
a given page(gold standard) and the currently parsed page. The score obtained from this similarity
is then distributed to its outlinks. This filter aims to focus the crawler to crawl/explore
relevant pages. 

This message was sent by Atlassian JIRA

View raw message