tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luis Filipe Nassif (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2033) Value attributes of input elements not extracted from HTML
Date Thu, 14 Jul 2016 20:39:20 GMT
Luis Filipe Nassif created TIKA-2033:
----------------------------------------

             Summary: Value attributes of input elements not extracted from HTML 
                 Key: TIKA-2033
                 URL: https://issues.apache.org/jira/browse/TIKA-2033
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 1.10
         Environment: Windows 7, java8 x64
            Reporter: Luis Filipe Nassif
            Priority: Minor


The text of value attributes of input elements currently is not extracted from HTML files.
Note it is rendered by browsers. I tried using IdentityHtmlMapper and played with HtmlSchema
with no luck. Simple test HTML below:

<HTML><body><input value='text'></input></body></HTML>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message