nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tejas Patil (JIRA)" <>
Subject [jira] [Updated] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons
Date Mon, 07 Jan 2013 08:02:13 GMT


Tejas Patil updated NUTCH-1031:

    Attachment: NUTCH-1031.v1.patch

The changes are done. Please let me know your comments.

One issue: I am not sure how crawler-commons works for multiple-agents. There is one test
case (_testRobotsTwoAgents_) failing due to that and I am not able to fix it. Can anyone help
> Delegate parsing of robots.txt to crawler-commons
> -------------------------------------------------
>                 Key: NUTCH-1031
>                 URL:
>             Project: Nutch
>          Issue Type: Task
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>            Priority: Minor
>              Labels: robots.txt
>             Fix For: 1.7
>         Attachments: NUTCH-1031.v1.patch
> We're about to release the first version of Crawler-Commons []
which contains a parser for robots.txt files. This parser should also be better than the one
we currently have in Nutch. I will delegate this functionality to CC as soon as it is available

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message