tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pascal Essiembre (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2922) Regression issue with detecting .dotx and .xlam MS Office mime-types
Date Mon, 12 Aug 2019 06:59:00 GMT
Pascal Essiembre created TIKA-2922:
--------------------------------------

             Summary: Regression issue with detecting .dotx and .xlam MS Office mime-types
                 Key: TIKA-2922
                 URL: https://issues.apache.org/jira/browse/TIKA-2922
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.22
         Environment: N/A
            Reporter: Pascal Essiembre


After upgrading to 1.22, .dotx and .xlam files are no longer detected properly. 

They are now detected as:

 
{noformat}
.dotx -> vnd.ms-word.template.macroenabled.12
.xlam -> application/x-tika-ooxml{noformat}
 

They should be detected like they originally were: 
{noformat}
.dotx -> vnd.openxmlformats-officedocument.wordprocessingml.template
.xlam -> application/vnd.ms-excel.addin.macroenabled.12{noformat}
Reference: [https://docs.microsoft.com/en-us/previous-versions/office/office-2007-resource-kit/ee309278(v=office.12)]

It is happening in StreamingZipContainerDetector and ZipContainerDetectorBase.

I will submit a pull request shortly with the correct mapping.

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message