tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benoit MAGGI (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-895) Empty title element makes Tika-generated HTML documents not open
Date Thu, 19 Apr 2012 09:37:11 GMT
Empty title element makes Tika-generated HTML documents not open

                 Key: TIKA-895
                 URL: https://issues.apache.org/jira/browse/TIKA-895
             Project: Tika
          Issue Type: Bug
          Components: metadata
    Affects Versions: 1.1
         Environment: Windows 7 
            Reporter: Benoit MAGGI
            Priority: Trivial

I try to transform an empty docx to an html file.
Ex : java -jar tika-app-1.1.jar -x example.docx > t.html

The html file can't be open with Firefox,Internet Explorer and Chrome.

The main point is that <title/> seems to be forbiden by html specification (can't get
the point on html5)
bq. http://www.w3.org/TR/html401/struct/global.html#h-7.4.2 
bq. 7.4.2 The TITLE element 
bq. <!-- The TITLE element is not considered part of the flow of text.
bq.        It should be displayed, for example as the page header or
bq.        window title. Exactly one title is required per document.
bq.     -->
bq. <!ELEMENT TITLE <http://www.w3.org/TR/html401/struct/global.html#edef-TITLE>
 - - (#PCDATA) -(%head.misc; 
bq. <http://www.w3.org/TR/html401/sgml/dtd.html#head.misc> ) -- document title -->
bq. <!ATTLIST TITLE %i18n <http://www.w3.org/TR/html401/sgml/dtd.html#i18n> >

bq. *Start tag: required, End tag: required*

For information there was the same bug with xls

The simple solution should be to provide an empty title by default

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message