nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <>
Subject [jira] [Commented] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.
Date Tue, 22 Jan 2013 01:36:13 GMT


Lewis John McGibbney commented on NUTCH-1284:

I am +1 for committing this to trunk. IIRC 2.x also suffers form the bug in NUTCH-1042 and
would also benefit from the improvements you propose within this issue Tejas. It would be
really great if you were able to patch 2.x as well. Great work.
> Add site fetcher.max.crawl.delay as log output by default.
> ----------------------------------------------------------
>                 Key: NUTCH-1284
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Tejas Patil
>            Priority: Trivial
>             Fix For: 1.7, 2.2
>         Attachments: NUTCH-1284.patch, NUTCH-1284-trunk.v1.patch
> Currently, when manually scanning our log output we cannot infer which pages are governed
by a crawl delay between successive fetch attempts of any given page within the site. The
value should be made available as something like:
> {code}
> 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching (crawl.delay=XXXms)
> {code}
> This way we can easily and quickly determine whether the fetcher is having to use this
functionality or not. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message