nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex McLintock (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1748) Despite Unix systems accept files containing two dots.Urlfilter-validator rejects such path names.
Date Wed, 09 Apr 2014 12:09:14 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964069#comment-13964069
] 

Alex McLintock commented on NUTCH-1748:
---------------------------------------

FYI

"The similarity to unix and other disk operating system filename conventions should be taken
as purely coincidental, and should not be taken to indicate that URIs should be interpreted
as file names."
 quote from http://www.w3.org/Addressing/URL/4_URI_Recommentations.html

That page also says 

The slash ("/", ASCII 2F hex) character is reserved for the delimiting of substrings whose
relationship is hierarchical. This enables partial forms of the URI. Substrings consisting
of single or double dots ("." or "..") are similarly reserved.

So if we assume that a substring is something which has to be delimited then "/../" is NOT
 allowed, but ".." surrounded by one or more other characters should be. 


> Despite Unix systems accept files containing two dots.Urlfilter-validator rejects such
path names.
> --------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1748
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1748
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 2.2.1
>            Reporter: Sertac TURKEL
>            Priority: Minor
>             Fix For: 2.3
>
>
> Unix systems accept files containing two dots "abc..xyz.txt". So
> urlfilter-validator should not  reject this kind of urls. Also paths containing "/../"
or "/.." in final position should be still rejected.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message