[ https://issues.apache.org/jira/browse/TIKA-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Guillaumin updated TIKA-914: ------------------------------------ Issue Type: Improvement (was: Bug) I think this is valid *XML* in the syntax sense which is why validators are passing, but not *XHTML* according to the spec, but I agree it's a deficiency either of the spec or Chrome (or both). Tika should however account for that if possible. Moved to improvement then ! Thanks. > Invalid self-closing title tag when parsing an RTF file > ------------------------------------------------------- > > Key: TIKA-914 > URL: https://issues.apache.org/jira/browse/TIKA-914 > Project: Tika > Issue Type: Improvement > Components: parser > Affects Versions: 1.1 > Environment: Reproduced on Linux and Windows > Reporter: Nicolas Guillaumin > Priority: Minor > Labels: rtf > Attachments: test.rtf > > > When parsing an RTF file with an empty TITLE metadata, the resulting HTML contains an self-closing title tag: > {code} > $ java -jar tika-app-1.1.jar -h test.rtf > > > > > > > </head> > [...] > {code} > I believe self-closing tags are not valid in XHTML, according to http://www.w3.org/TR/xhtml1/#C_3 (However there's no XHTML doctype generated here, just a namespace...). Anyway this causes some browsers like Chrome to fail parsing the HTML, resulting in a blank page displayed. > The expected output would be a non self-closing empty tag: {{<title>}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira