nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-1958) Remove scoring-opic from nutch-default.xml
Date Thu, 23 Apr 2015 21:43:39 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lewis John McGibbney updated NUTCH-1958:
----------------------------------------
    Fix Version/s:     (was: 1.10)
                   1.11

> Remove scoring-opic from nutch-default.xml
> ------------------------------------------
>
>                 Key: NUTCH-1958
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1958
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 2.3, 1.9
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 2.4, 1.11
>
>
> I propose we remove scoring-opic from nutch-default. We all know it is flawed for any
kind of incremental crawl, which most of us do. It is also useless if you want to perform
a single crawl, if you must crawl all records of a domain, using OPIC for prioritizing URLS
makes no sense. It also confuses users as we have seen in the past and recently [1].
> What do you think?
> [1]: http://lucene.472066.n3.nabble.com/Nutch-documents-have-huge-scores-in-Solr-td4192064.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message