tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-531) xmpTPg:NPages creates invalid XML
Date Mon, 01 Nov 2010 00:17:23 GMT

    [ https://issues.apache.org/jira/browse/TIKA-531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12926796#action_12926796
] 

Jukka Zitting commented on TIKA-531:
------------------------------------

How is the output invalid XML? The name attribute in <meta name="xmpTPg:NPages" content="..."/>
is defined as a plain CDATA attribute by XHTML, so a parser shouldn't try to parse it's contents
as an XML name.

Note that down the line we may want to switch to something like RDFa for serializing metadata
attributes, but for now the metadata names should be treated just as plain strings even though
the xmp ones look like XML names with their prefixes.

> xmpTPg:NPages creates invalid XML
> ---------------------------------
>
>                 Key: TIKA-531
>                 URL: https://issues.apache.org/jira/browse/TIKA-531
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 0.8
>            Reporter: Sjoerd Smeets
>             Fix For: 0.8
>
>
> Hi,
> Parsing MS Office files or PDF documents results invalid XML as there is a missing name-space
definition for xmpTPg:NPages. What would be the best approach, renaming this field or add
the name-space definition to the header of the output xml?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message