nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model
Date Fri, 01 Apr 2016 23:15:25 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222511#comment-15222511
] 

ASF GitHub Bot commented on NUTCH-2245:
---------------------------------------

Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/101#discussion_r58280048
  
    --- Diff: src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/util/LuceneTokenizer.java
---
    @@ -88,6 +89,12 @@ public TokenStream getTokenStream() {
         return tokenStream;
       }
       
    +  public LuceneTokenizer(String content, TokenizerType tokenizer, StemFilterType stemFilterType,
int ngram) {
    --- End diff --
    
    No Javadoc?


> Developed the NGram Model on the existing Unigram Cosine Similarity Model
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-2245
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2245
>             Project: Nutch
>          Issue Type: New Feature
>          Components: plugin, scoring
>            Reporter: Bhavya Sanghavi
>            Assignee: Sujen Shah
>            Priority: Minor
>              Labels: memex
>
> Built on the existing unigram cosine similarity model by adding the Ngram model, thus
providing flexibility to the user to choose the window size for scoring the similarity between
webpages and the gold standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message