tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ken Krugler (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-975) LinkBuilder to optionally collapse anchor whitespace
Date Wed, 22 Aug 2012 18:54:42 GMT

     [ https://issues.apache.org/jira/browse/TIKA-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ken Krugler resolved TIKA-975.
------------------------------

    Resolution: Fixed
      Assignee: Ken Krugler

Committed patch in r1376190. Thanks, Markus!

                
> LinkBuilder to optionally collapse anchor whitespace
> ----------------------------------------------------
>
>                 Key: TIKA-975
>                 URL: https://issues.apache.org/jira/browse/TIKA-975
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.2
>            Reporter: Markus Jelsma
>            Assignee: Ken Krugler
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: TIKA-975-1.3-1.patch, TIKA-975-1.3-2.patch
>
>
> Links extracted by the LinkContentHandler contain the verbatim anchor text. This is usually
fine but unfortunately many websites have the anchor text spread over multiple lines or have
it indented with tabulators or spaces.
> This patch adds a boolean option to LinkContentHandler with which whitespace collapsing
can be toggled on or off. Default behaviour remains as-is and the API remains backward compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message