nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kim Whitehall (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-2125) Metrics
Date Mon, 28 Sep 2015 15:06:04 GMT
Kim Whitehall created NUTCH-2125:
------------------------------------

             Summary: Metrics
                 Key: NUTCH-2125
                 URL: https://issues.apache.org/jira/browse/NUTCH-2125
             Project: Nutch
          Issue Type: Improvement
          Components: tool
    Affects Versions: 1.10
            Reporter: Kim Whitehall


Purpose: a metric for determining if the “relevancy” of a crawl after each round and the
“relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms
will be stored. 

- Return the topN terms per a page 

- Return the topN terms per a segment  based on td-idf

- Leverage Apache Lucene libs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message