tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Davidson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-2474) Mime type should is vnd.apple.unknown.13 for valid keynote file
Date Mon, 09 Oct 2017 19:31:00 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Richard Davidson updated TIKA-2474:
-----------------------------------
    Priority: Minor  (was: Major)

> Mime type should is vnd.apple.unknown.13 for valid keynote file
> ---------------------------------------------------------------
>
>                 Key: TIKA-2474
>                 URL: https://issues.apache.org/jira/browse/TIKA-2474
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Richard Davidson
>            Priority: Minor
>         Attachments: Untitled.key
>
>
> When I try to detect the sub mime type for the attached keynote file I get vnd.apple.unknown.13.

> I think the code which handles the keynote files in Tika is  https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/iwork/iwana/IWork13PackageParser.java
and the specific code is:
> {code}
>         public static MediaType detect(ZipFile zipFile) {
>             ZipArchiveEntry entry = zipFile.getEntry("Index/MasterSlide.iwa");
>             if (zipFile.getEntry("Index/MasterSlide.iwa") != null ||
>                     zipFile.getEntry("Index/Slide.iwa") != null) {
>                 return KEYNOTE13.getType();
>             }
>             //TODO: figure out how to distinguish numbers from pages
>             return UNKNOWN13.getType();
>         }
> {code}
> My file does not contain a Index/Slide.iwa or Index/MasterSlide.iwa but does contain
multiple files like: MasterSlide-3857.iwa and Slide-3885.iwa. I think the detection logic
should use a regex to check for MasterSlide-*-iwa or Slide-*-iwa. 
> If people agree with this approach I can submit a pull request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message