nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kim Whitehall (JIRA)" <>
Subject [jira] [Created] (NUTCH-2125) Metrics
Date Mon, 28 Sep 2015 15:06:04 GMT
Kim Whitehall created NUTCH-2125:

             Summary: Metrics
                 Key: NUTCH-2125
             Project: Nutch
          Issue Type: Improvement
          Components: tool
    Affects Versions: 1.10
            Reporter: Kim Whitehall

Purpose: a metric for determining if the “relevancy” of a crawl after each round and the
“relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms
will be stored. 

- Return the topN terms per a page 

- Return the topN terms per a segment  based on td-idf

- Leverage Apache Lucene libs

This message was sent by Atlassian JIRA

View raw message