tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luis Filipe Nassif (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2033) Value attributes of input elements not extracted from HTML
Date Thu, 14 Jul 2016 21:11:20 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378386#comment-15378386
] 

Luis Filipe Nassif commented on TIKA-2033:
------------------------------------------

It is ok to put the text into an "input" element in the resulting XHTML?

> Value attributes of input elements not extracted from HTML 
> -----------------------------------------------------------
>
>                 Key: TIKA-2033
>                 URL: https://issues.apache.org/jira/browse/TIKA-2033
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.10
>         Environment: Windows 7, java8 x64
>            Reporter: Luis Filipe Nassif
>            Priority: Minor
>
> The text of value attributes of input elements currently is not extracted from HTML files.
Note it is rendered by browsers. I tried using IdentityHtmlMapper and played with HtmlSchema
with no luck. Simple test HTML below:
> <HTML><body><input value='text'></input></body></HTML>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message