tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2726) Handle truncated ooxml more robustly
Date Fri, 14 Sep 2018 13:21:00 GMT
Tim Allison created TIKA-2726:

             Summary: Handle truncated ooxml more robustly
                 Key: TIKA-2726
                 URL: https://issues.apache.org/jira/browse/TIKA-2726
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison

With the move from Tika 1.18 to 1.19, we're now failing to detect some truncated ooxmls more
specifically than {{tika-ooxml}}.  In the attached example, which Excel is able to fix, in
1.18, this file is identified as {{application/vnd.ms-excel.sheet.macroenabled.12}}, and it
is parsed without exception by the ooxml parser.  However, in 1.19, this is identified as
{{tika-ooxml}} and then parsed by the Package parser.

This message was sent by Atlassian JIRA

View raw message