nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model
Date Fri, 01 Apr 2016 23:14:25 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222510#comment-15222510
] 

ASF GitHub Bot commented on NUTCH-2245:
---------------------------------------

Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/101#discussion_r58279977
  
    --- Diff: src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/Model.java
---
    @@ -68,6 +68,11 @@ public static synchronized void createModel(Configuration conf) throws
IOExcepti
             }
             LOG.info("Loaded custom stopwords from {}",conf.get("scoring.similarity.stopword.file"));
           }
    +
    +      //Check if user has specified n for ngram cosine model
    +      int ngram = conf.getInt("scoring.similarity.ngrams", 1);
    +      LOG.info("Value of ngram: "+ngram);
    --- End diff --
    
    Please use correct effficient slf4j code notation here
    e.g. LOG.info("Value of ngram: {} ", ngram);


> Developed the NGram Model on the existing Unigram Cosine Similarity Model
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-2245
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2245
>             Project: Nutch
>          Issue Type: New Feature
>          Components: plugin, scoring
>            Reporter: Bhavya Sanghavi
>            Assignee: Sujen Shah
>            Priority: Minor
>              Labels: memex
>
> Built on the existing unigram cosine similarity model by adding the Ngram model, thus
providing flexibility to the user to choose the window size for scoring the similarity between
webpages and the gold standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message