maven-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Glick (JIRA)" <j...@codehaus.org>
Subject [jira] Commented: (WAGON-218) Link Parsing in http is flawed
Date Mon, 23 Aug 2010 19:33:40 GMT

    [ http://jira.codehaus.org/browse/WAGON-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=232855#action_232855
] 

Jesse Glick commented on WAGON-218:
-----------------------------------

Why not get rid of nekohtml (+ XNI), saving 152 Kb as well as complexity in the Maven third-party
dependency list, and directly search for {{(?i)<a href="(.+?)">}} or similar? After
all, the intended use case is to find links in index listings generated by a small number
of distinct pieces of software. These generators are surely not going to use exotic formatting
or attributes of the kind created by humans editing HTML by hand or with WYSIWYG designers.

I would be happy to supply a patch if there is interest.

> Link Parsing in http is flawed
> ------------------------------
>
>                 Key: WAGON-218
>                 URL: http://jira.codehaus.org/browse/WAGON-218
>             Project: Maven Wagon
>          Issue Type: Improvement
>          Components: wagon-http, wagon-http-lightweight
>    Affects Versions: 1.0-beta-2
>            Reporter: Joakim Erdfelt
>            Assignee: Joakim Erdfelt
>
> The link parsing in wagon http has a few issues.
> a) not all links detected.
> b) the various ways that page content is identified via url string manipulation isn't
working in many example cases.
> c) the use of jtidy introduces a large dependency and high memory usage.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://jira.codehaus.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message