nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Commented: (NUTCH-770) Timebomb for Fetcher
Date Sat, 28 Nov 2009 20:57:20 GMT


Andrzej Bialecki  commented on NUTCH-770:

I propose to change the name of this functionality - "timebomb" is not self-explanatory, and
it suggests that if you misbehave then your cluster may explode ;) Instead I would use "time
limit", rename all vars and methods to follow this naming, and document it properly in nutch-default.xml.

A few comments to the patch:

* it has some overlap with NUTCH-769 (the emptyQueue() method), but that's easy to resolve,
see also the next point.

* why change the code in FetchQueues at all? Time limit is a global condition, we could just
break the main loop in run() and ignore the QueueFeeder (or don't start it if the time limit
already passed when starting run() ).

* the patch does not follow the code style (notably whitespace in for/while loops and assignments).

> Timebomb for Fetcher
> --------------------
>                 Key: NUTCH-770
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Julien Nioche
>         Attachments: log-770, NUTCH-770.patch
> This patch provides the Fetcher with a timebomb mechanism. By default the timebomb is
not activated; it can be set using the parameter fetcher.timebomb.mins. The number of minutes
is relative to the start of the Fetch job. When the number of minutes is reached, the QueueFeeder
skips all remaining entries then all active queues are purged. This allows to keep the Fetch
step under comtrol and works well in combination with NUTCH-769

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message