nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ferdy Galema (JIRA)" <>
Subject [jira] [Commented] (NUTCH-1508) Port limit crawler to defined depth to 2.x
Date Mon, 07 Jan 2013 10:48:15 GMT


Ferdy Galema commented on NUTCH-1508:

NUTCH-1431 (aka 'distance' concept) only defines a global one. However, for an internal branch
I created a hack that allows to specify it on a per host-basis using the host table. Not very

I think NUTCH-1331 is the better approach, because it is indeed less intrusive and because
it allows to define a scoring instead of ignoring depth-exceeding urls. (Also to keep 1.x
and 2.x differences at a minimum). So when this gets implemented for 2.x we can throw away
the changes in NUTCH-1431.
> Port limit crawler to defined depth to 2.x
> ------------------------------------------
>                 Key: NUTCH-1508
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 2.2
>            Reporter: Julien Nioche

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message