nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (Commented) (JIRA)" <>
Subject [jira] [Commented] (NUTCH-1098) better url-normalizer basic
Date Fri, 04 Nov 2011 16:09:51 GMT


Chris A. Mattmann commented on NUTCH-1098:

Guys: let's change the tone of this issue, OK?

Radim, thanks for your patch. Sorry that it didn't get applied or that folks tried to engage
in feedback/discussion with you on it. I would encourage you to not get discouraged and I
appreciate your effort in trying to contribute to the Apache Nutch project.

The committers are the ones that have to figure out how to maintain things and sometimes we
get hung up on yes I'll agree less important issues. I'm going to recommend that everyone
just table those at the moment and that we move forward here. 

Here are some concrete next steps:

1. Ferdy: is it possible to commit a portion of this patch that you do understand? Then we
could leave the part that you don't uncommitted. This has 2 immediate goals:
  - gives Radim a good feeling for contributing to the project -- he deserves that.
  - gives us the ability to cherry pick what we understand and are willing to maintain

2. Radim: if you want to help in improving the formatting and other requested issues, great.
If you don't then that's fine too. At that point though the maintenance/evolution of the patch
will transition more into the Nutch folks and you might not be as involved with it unless
you get on board with what the guys have decided are their code formatting and patch generation


> better url-normalizer basic
> ---------------------------
>                 Key: NUTCH-1098
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: fetcher
>    Affects Versions: 1.3
>         Environment: Any
>            Reporter: Radim Kolar
>            Assignee: Markus Jelsma
>              Labels: encoding, url
>             Fix For: 1.5
>         Attachments: patch-with-utf8-encoding.diff
>   Original Estimate: 4h
>  Remaining Estimate: 4h
> Basic URL normalizer lacks 2 important features
> Encode space in URL into %20 to unbreak httpclient and possibly others who do not expect
space inside URL
> Ability to decode %33 encoding in URL. This is important for avoiding duplicates

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message