nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <>
Subject [jira] Updated: (NUTCH-731) Redirection of robots.txt in RobotRulesParser
Date Fri, 03 Apr 2009 17:56:12 GMT


Julien Nioche updated NUTCH-731:

    Attachment: NUTCH-731.patch

> Redirection of robots.txt in RobotRulesParser
> ---------------------------------------------
>                 Key: NUTCH-731
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>         Attachments: NUTCH-731.patch
> The patch attached allows to follow one level of redirection for robots.txt files. A
similar issue was mentioned in NUTCH-124 and has been marked as fixed a long time ago but
the problem remained, at least when using Fetcher2 . Mathijs Homminga pointed to the problem
in a mail to the nutch-dev list in March.
> I have been using this patch for a while now on a large cluster and noticed that the
ratio of robots_denied per fetchlist went up, meaning that at least we are now getting restrictions
we would not have had before (and getting less complaints from webmasters at the same time)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message