nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1098) better url-normalizer basic
Date Thu, 03 Nov 2011 10:45:32 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143038#comment-13143038
] 

Julien Nioche commented on NUTCH-1098:
--------------------------------------

@Radim

Sounds like "I am not going to" is your favourite phrase.

You certainly prefer GIT to SVN, but the fact is that Nutch uses the latter and contributions
are expected to be generated with 'svn diff'. By going on with what the rest of the community
do (vs imposing your ways to others) you will make it easier for people to discuss, review
and commit your contributions.

http://wiki.apache.org/nutch/HowToContribute >> Creating a patch
http://www.apache.org/foundation/how-it-works.html  >> Philosophy

Thanks
                
> better url-normalizer basic
> ---------------------------
>
>                 Key: NUTCH-1098
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1098
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.3
>         Environment: Any
>            Reporter: Radim Kolar
>            Assignee: Markus Jelsma
>              Labels: encoding, url
>             Fix For: 1.5
>
>         Attachments: patch-with-utf8-encoding.diff
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do not expect
space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding duplicates

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message