nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emmanuel Joke (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-548) Move URLNormalizer from Outlink to ParseOutputFormat
Date Tue, 04 Sep 2007 10:38:58 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524669
] 

Emmanuel Joke commented on NUTCH-548:
-------------------------------------

Actually I've one comment/question. I noticed that we normalize and filter every links in
ParseOutputFormat and then we do it again in CrawlDbFilter during the updateDb procedure.
Is it really needed to do it twice or could we also remove this duplicate operation ?



> Move URLNormalizer from Outlink to ParseOutputFormat
> ----------------------------------------------------
>
>                 Key: NUTCH-548
>                 URL: https://issues.apache.org/jira/browse/NUTCH-548
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>            Reporter: Emmanuel Joke
>            Assignee: Emmanuel Joke
>            Priority: Minor
>             Fix For: 1.0.0
>
>         Attachments: NUTCH-548.patch
>
>
> The idea is to avoid instantiating a new URLNormalizer for every OutLink. 
> So I move this operation to the ParseOutputFormat object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message