nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1326) HostDeduplicator for Nutch
Date Tue, 10 Dec 2013 12:12:08 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844221#comment-13844221
] 

Markus Jelsma commented on NUTCH-1326:
--------------------------------------

Otis, there is no such feature already in 2.x but you would only want to automatically emit
rules for the HostNormalizer if you use it and do very large scale web crawling.

> HostDeduplicator for Nutch
> --------------------------
>
>                 Key: NUTCH-1326
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1326
>             Project: Nutch
>          Issue Type: New Feature
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.9
>
>
> A host deduplicator able to emit rules for the HostNormalizer. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message