nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1186) FreeGenerator always normalizes
Date Wed, 09 Nov 2011 14:43:51 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147069#comment-13147069
] 

Markus Jelsma commented on NUTCH-1186:
--------------------------------------

This actually not the FreeGenerator but the URLPartitioner class doing the Partioning scope
normalizing. I'm not sure what would be good behaviour. The common generator is also affected
and uses the partitioner when turning fetch lists into segments. Without scope, this means
ALL selected URL's are at least normalized once, twice when the normalizing is actually in
use.

Thoughts?
                
> FreeGenerator always normalizes
> -------------------------------
>
>                 Key: NUTCH-1186
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1186
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.3
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.5
>
>
> The FreeGenerator does not honor the -normalize option, it always normalizes all URL's
in the input directory. The -filter option is respected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message