nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-606) Refactoring of Generator, run all urls through checks
Date Sat, 09 Feb 2008 00:13:07 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567250#action_12567250
] 

Andrzej Bialecki  commented on NUTCH-606:
-----------------------------------------

+1. A minor issue: I don't think URL.getHost() can return a null value - even for URLs with
unspecified host name it returns an empty non-null String.

> Refactoring of Generator, run all urls through checks
> -----------------------------------------------------
>
>                 Key: NUTCH-606
>                 URL: https://issues.apache.org/jira/browse/NUTCH-606
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>         Environment: all
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-606-1-20080208.patch, NUTCH-606-2-20080208.patch
>
>
> Refactor the generator to make sure all host run through checks such as host and protocol
checks, ip checks if necessary.  Currently the generator only does this for urls if generate.max.per.host
> 0 which by default is -1.  So by default all urls will get collected without checks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message