nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tejas Patil (JIRA)" <>
Subject [jira] [Updated] (NUTCH-1284) Add site fetcher.max.crawl.delay as log output by default.
Date Sun, 20 Jan 2013 10:42:16 GMT


Tejas Patil updated NUTCH-1284:

    Attachment: NUTCH-1284-trunk.v1.patch

Hi Lewis,
If I recall correctly, we want the crawl delay for the url (and hence its queues' delay) to
be logged with the urls' fetching begins. Right ?
> Add site fetcher.max.crawl.delay as log output by default.
> ----------------------------------------------------------
>                 Key: NUTCH-1284
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: nutchgora, 1.5
>            Reporter: Lewis John McGibbney
>            Assignee: Tejas Patil
>            Priority: Trivial
>             Fix For: 1.7, 2.2
>         Attachments: NUTCH-1284.patch, NUTCH-1284-trunk.v1.patch
> Currently, when manually scanning our log output we cannot infer which pages are governed
by a crawl delay between successive fetch attempts of any given page within the site. The
value should be made available as something like:
> {code}
> 2012-02-19 12:33:33,031 INFO  fetcher.Fetcher - fetching (crawl.delay=XXXms)
> {code}
> This way we can easily and quickly determine whether the fetcher is having to use this
functionality or not. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message