tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marco Quaranta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-976) Inaccurate XLS detection trough POIFSContainerDetector
Date Wed, 12 Sep 2012 16:16:07 GMT

    [ https://issues.apache.org/jira/browse/TIKA-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454089#comment-13454089

Marco Quaranta commented on TIKA-976:

I will investigate further by asking a collague, however it should have been created in Excel
> Inaccurate XLS detection trough POIFSContainerDetector
> ------------------------------------------------------
>                 Key: TIKA-976
>                 URL: https://issues.apache.org/jira/browse/TIKA-976
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.2
>            Reporter: Marco Quaranta
>              Labels: detection, mime, poi, xls
>         Attachments: test_book.xls
> I've found an inaccurate detection with the attached xls file. POIFSContainerDetector
is unable to determine the exact mimetype (vnd.ms-excel) and returns the generic "x-tika-msoffice".
This is due to the fact this file's root names are :[Book, DocumentSummaryInformation, SummaryInformation].
POIFSContainerDetector checks only that names contains "WorkBook".
> Could it be possible to add a further or-check like this:
> if (names.contains("Workbook") || names.contains("Book"))
> Thank you,
> Marco

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message