nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sami Siren (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-247) robot parser to restrict.
Date Fri, 20 Feb 2009 09:43:01 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sami Siren updated NUTCH-247:
-----------------------------

    Patch Info: [Patch Available]

> robot parser to restrict.
> -------------------------
>
>                 Key: NUTCH-247
>                 URL: https://issues.apache.org/jira/browse/NUTCH-247
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.8
>            Reporter: Stefan Groschupf
>            Assignee: Dennis Kubes
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: agent-names.patch, agent-names3.patch.txt
>
>
> If the agent name and the robots agents are not proper configure the Robot rule parser
uses LOG.severe to log the problem but solve it also. 
> Later on the fetcher thread checks for severe errors and stop if there is one.
> RobotRulesParser:
> if (agents.size() == 0) {
>       agents.add(agentName);
>       LOG.severe("No agents listed in 'http.robots.agents' property!");
>     } else if (!((String)agents.get(0)).equalsIgnoreCase(agentName)) {
>       agents.add(0, agentName);
>       LOG.severe("Agent we advertise (" + agentName
>                  + ") not listed first in 'http.robots.agents' property!");
>     }
> Fetcher.FetcherThread:
>  if (LogFormatter.hasLoggedSevere())     // something bad happened
>             break;  
> I suggest to use warn or something similar instead of severe to log this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message