nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <>
Subject [jira] Created: (NUTCH-567) Proper (?) handling of URIs in TagSoup.
Date Wed, 17 Oct 2007 12:07:51 GMT
Proper (?) handling of URIs in TagSoup.

                 Key: NUTCH-567
             Project: Nutch
          Issue Type: Improvement
            Reporter: Dawid Weiss
            Priority: Minor
         Attachments: uri-entities.patch

Doug Cook reported that TagSoup incorrectly handles some URI parameters. More discussion on
the list and at TagSoup's mailing list.

I looked at the sources of TagSoup because I'm using it myself (although the URIs are not
relevant for me). It seems like you can implement a naive workaround by remembering the parsing
state and just avoiding entity resolution. Attached is the patch that does this.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message