nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radim Kolar (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1098) better url-normalizer basic
Date Fri, 04 Nov 2011 16:59:51 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144160#comment-13144160
] 

Radim Kolar commented on NUTCH-1098:
------------------------------------

If you are so clever and hard working then stop undeleting my patch and write better one yourself.
I am licensing my work as Affero GPL v3 from now.

You simply need months to discuss trivial code change. Everybody here claims to be smart like
TV and hard working like black men but look at your results: mere 13 trivial commits in October.
Look at my results i have 2.1 billions files indexed in 4 months.

I reworked major portion of nutch and i dont want to spend years waiting if they and ever
and when will be merged. I have hadoop 0.21 api, generator with plugable algorithm, fixed
building with maven, database backend switched to cassandra and other stuff. For me is far
better to just pull 20 yours patches per month from github and dont waste my time with you
in pointless discussions like git vs svn diff format.
                
> better url-normalizer basic
> ---------------------------
>
>                 Key: NUTCH-1098
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1098
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.3
>         Environment: Any
>            Reporter: Radim Kolar
>            Assignee: Markus Jelsma
>              Labels: encoding, url
>             Fix For: 1.5
>
>         Attachments: patch-with-utf8-encoding.diff
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do not expect
space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding duplicates

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message