nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1730) Scoring-depth optionally not to increment depth for external hosts
Date Wed, 01 Jul 2015 13:04:05 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610079#comment-14610079
] 

Sebastian Nagel commented on NUTCH-1730:
----------------------------------------

Hi Markus, the use cases resembles the sitemap injector (NUTCH-1465): fetch and parse a document
(here: in any format, not only sitemaps), extract the outlinks, and inject them (but accept
external links). But it should be possible to add this to scoring-depth by having different
depth thresholds for external and internal links, ev. also two depth counters.

> Scoring-depth optionally not to increment depth for external hosts
> ------------------------------------------------------------------
>
>                 Key: NUTCH-1730
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1730
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.7
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.11
>
>         Attachments: NUTCH-1730-trunk.patch, NUTCH-1730.patch
>
>
> Currently, the plugin always increments depth, even when coming or going to external
hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message