nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1748) urlfilter-validator to allow .. (two dots) inside file names (path elements)
Date Wed, 09 Apr 2014 13:26:16 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964127#comment-13964127
] 

Sebastian Nagel commented on NUTCH-1748:
----------------------------------------

Hi [~alexmc], you'r absolutely right: the analogy to Unix file names ([drawn here|http://mail-archives.apache.org/mod_mbox/nutch-dev/201404.mbox/%3C533F1D81.7020401%40googlemail.com%3E])
is of no relevancy. Tried to reformulate it: urlfilter-validate should allow two dots inside
path elements, e.g. inside the "file name" as in [~msertacturkel]'s example:
{code}
http://www.example.com/example-example..-16067h.htm
{code}
Of course, there must be surrounding (leading or trailing characters): a path element ".."
should be rejected two avoid trivial duplicates on the URL level.


> urlfilter-validator to allow .. (two dots) inside file names (path elements)
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-1748
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1748
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.2.1
>            Reporter: Sertac TURKEL
>            Priority: Minor
>             Fix For: 2.3
>
>
> Unix systems accept files containing two dots "abc..xyz.txt". So
> urlfilter-validator should not  reject this kind of urls. Also paths containing "/../"
or "/.." in final position should be still rejected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message