tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mads Hansen (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-438) Parse and return the complete set of custom document properties from MS Office documents
Date Sun, 13 Jun 2010 16:31:31 GMT
Parse and return the complete set of custom document properties from MS Office documents
----------------------------------------------------------------------------------------

                 Key: TIKA-438
                 URL: https://issues.apache.org/jira/browse/TIKA-438
             Project: Tika
          Issue Type: Improvement
          Components: parser
    Affects Versions: 0.7
            Reporter: Mads Hansen


All MS Office document custom properties should be parsed and returned in the Metadata set.
 This would be consistent with how all HTML meta tags are parsed and returned.

CustomProperties are already being parsed to produce the Metadata.LANGUAGE property when normalizing
document properties into the Dublin Core metadata set.  With minor modifications to the org.apache.tika.parser.microsoft.SummaryExtractor
class the entire set of Custom Properties could be obtained and set for the document metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message