nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2039) Relevance based scoring filter
Date Thu, 18 Jun 2015 05:38:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591287#comment-14591287
] 

Lewis John McGibbney commented on NUTCH-2039:
---------------------------------------------

I actually have no idea how in gods name to apply patches from Github tbh. Every time I try
to merge this in from your remote branch it is completely messed up.
Downloading the patch and trying to apply locally provides me with conflicts on IndexingJob
and many other classes... additionally as the Github patch cannot be cleanly applied against
SVN local copies it gets messed up.
If someone can provide me with details on how to associate the github trunk branch then I'll
happily do this. 
Some guidance would be appreciated.

> Relevance based scoring filter
> ------------------------------
>
>                 Key: NUTCH-2039
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2039
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Sujen Shah
>            Assignee: Lewis John McGibbney
>              Labels: memex, nutch
>             Fix For: 1.11
>
>
> A ScoringFilter plugin that uses a similarity measure to calculate the similarity between
a given page(gold standard) and the currently parsed page. The score obtained from this similarity
is then distributed to its outlinks. This filter aims to focus the crawler to crawl/explore
relevant pages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message