nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-382) Fix for NUTCH-365 introduced a bug if generate.max.per.host.by.ip is enabled
Date Wed, 06 Feb 2008 12:55:19 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566117#action_12566117
] 

Andrzej Bialecki  commented on NUTCH-382:
-----------------------------------------

This has been fixed as a part of another commit.

> Fix for NUTCH-365 introduced a bug if generate.max.per.host.by.ip is enabled
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-382
>                 URL: https://issues.apache.org/jira/browse/NUTCH-382
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 0.9.0
>            Reporter: Jim Kellerman
>             Fix For: 1.0.0
>
>         Attachments: patch.txt
>
>
> The fix for NUTCH-365 in org.apache.nutch.crawl.Generator.java (revision 449088) introduced
a bug in which if generate.max.per.host.by.ip is enabled, the error message "WARN  crawl.Generator
(Generator.java:reduce(181)) - Malformed URL: '38.99.15.82', skipping". The message varies
according to the host IP.
> This is because the hostname has already been converted to its IP address, and the code:
>               host = normalizers.normalize(host, URLNormalizers.SCOPE_GENERATE_HOST_COUNT);
> will not normalize an IP address. What is needed to fix this this problem is to include
the code inserted in revision 449088 inside an else block so that this code is not executed
if generate.max.per.host.by.ip is enabled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message