nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kim Whitehall (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-2125) Metrics
Date Mon, 28 Sep 2015 15:12:04 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kim Whitehall updated NUTCH-2125:
---------------------------------
    Description: 
Purpose: a metric for determining if the “relevancy” of a crawl after each round and the
“relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms
will be stored. 

- Return the topN terms per a page 

- Return the topN terms per a segment  based on tf-idf

- Leverage Apache Lucene libs

  was:
Purpose: a metric for determining if the “relevancy” of a crawl after each round and the
“relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms
will be stored. 

- Return the topN terms per a page 

- Return the topN terms per a segment  based on td-idf

- Leverage Apache Lucene libs


> Metrics
> -------
>
>                 Key: NUTCH-2125
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2125
>             Project: Nutch
>          Issue Type: Improvement
>          Components: tool
>    Affects Versions: 1.10
>            Reporter: Kim Whitehall
>              Labels: memex
>
> Purpose: a metric for determining if the “relevancy” of a crawl after each round
and the “relevancy” of a page. NB: this is not a scoring plugin. By default, the first
25 terms will be stored. 
> - Return the topN terms per a page 
> - Return the topN terms per a segment  based on tf-idf
> - Leverage Apache Lucene libs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message