nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-2196) IndexingFilterChecker to optionally normalize
Date Wed, 13 Jan 2016 12:38:39 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-2196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Markus Jelsma updated NUTCH-2196:
---------------------------------
    Attachment: NUTCH-2196.patch

Patch for trunk introducing the -normalize flag. If enabled, input URL's are passed through
configured normalizers (SCOPE DEFAULT). So it is possible to input unencoded URL's etc.

Removed URLUtil so it is no also possible to input both encoded as well as unencoded URL's
at the same time!

Will commit shortly

> IndexingFilterChecker to optionally normalize
> ---------------------------------------------
>
>                 Key: NUTCH-2196
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2196
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Trivial
>             Fix For: 1.12
>
>         Attachments: NUTCH-2196.patch
>
>
> As mentioned in NUTCH-2194, we sometimes use it as a backend for a web application. If
so, then end users are obviously going to input bad URL's so having a normalizer running would
smooth user satisfaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message