nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <>
Subject [jira] Created: (NUTCH-731) Redirection of robots.txt in RobotRulesParser
Date Fri, 03 Apr 2009 17:54:13 GMT
Redirection of robots.txt in RobotRulesParser

                 Key: NUTCH-731
             Project: Nutch
          Issue Type: Improvement
          Components: fetcher
    Affects Versions: 1.0.0
            Reporter: Julien Nioche

The patch attached allows to follow one level of redirection for robots.txt files. A similar
issue was mentioned in NUTCH-124 and has been marked as fixed a long time ago but the problem
remained, at least when using Fetcher2 . Mathijs Homminga pointed to the problem in a mail
to the nutch-dev list in March.

I have been using this patch for a while now on a large cluster and noticed that the ratio
of robots_denied per fetchlist went up, meaning that at least we are now getting restrictions
we would not have had before (and getting less complaints from webmasters at the same time)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message