nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1730) Scoring-depth optionally not to increment depth for external hosts
Date Wed, 01 Jul 2015 08:38:06 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609740#comment-14609740
] 

Markus Jelsma commented on NUTCH-1730:
--------------------------------------

Hello Sebastian!

* thanksI The unit tests are not affected as both have the same typo
* of course!
* yes, -1 disables it completely and 0 is a non-sensible depth as well

The use-case is that if you want to crawl many different hosts and not restrict them to the
initial seed that was another host. You are right about linking to external deep page indeed.
So this approach is flawed. Depth must always be controlled from the domain root!

> Scoring-depth optionally not to increment depth for external hosts
> ------------------------------------------------------------------
>
>                 Key: NUTCH-1730
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1730
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.7
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.11
>
>         Attachments: NUTCH-1730-trunk.patch, NUTCH-1730.patch
>
>
> Currently, the plugin always increments depth, even when coming or going to external
hosts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message