nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1011) Normalize duplicate slashes in URL's
Date Mon, 04 Jul 2011 14:31:22 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059458#comment-13059458
] 

Markus Jelsma commented on NUTCH-1011:
--------------------------------------

With NUTCH-1013 resolved, is patch eligible for inclusion? 

> Normalize duplicate slashes in URL's
> ------------------------------------
>
>                 Key: NUTCH-1011
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1011
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.4, 2.0
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4, 2.0
>
>         Attachments: NUTCH-1011-all-3.patch
>
>
> Many websites produce faulty URL's with multiple slashes e.g. http://cocoon.apache.org///////////////////////1.x/dynamic.html
> This can be really nasty if the number of slashes varies, resulting in many URL's actually
pointing to the same page and generating new (unique) URL's to the same or other duplicate
pages.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message