nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model
Date Sun, 03 Apr 2016 02:18:25 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223104#comment-15223104
] 

ASF GitHub Bot commented on NUTCH-2245:
---------------------------------------

Github user sujen1412 commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/101#discussion_r58303167
  
    --- Diff: src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/Model.java
---
    @@ -115,6 +126,7 @@ public static DocVector createDocVector(String content) {
           tStream.reset();
           while(tStream.incrementToken()) {
             String term = charTermAttribute.toString();
    +        LOG.info(term);
    --- End diff --
    
    This seems like its used for debugging, please change it to LOG.debug(). It helps keeping
the log clean. 
    Thanks!


> Developed the NGram Model on the existing Unigram Cosine Similarity Model
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-2245
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2245
>             Project: Nutch
>          Issue Type: New Feature
>          Components: plugin, scoring
>            Reporter: Bhavya Sanghavi
>            Assignee: Sujen Shah
>            Priority: Minor
>              Labels: memex
>
> Built on the existing unigram cosine similarity model by adding the Ngram model, thus
providing flexibility to the user to choose the window size for scoring the similarity between
webpages and the gold standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message