tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-2726) Handle truncated ooxml more robustly
Date Thu, 03 Jan 2019 20:34:00 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Allison resolved TIKA-2726.
-------------------------------
       Resolution: Duplicate
         Assignee: Tim Allison
    Fix Version/s: 1.21
                   2.0.0

> Handle truncated ooxml more robustly
> ------------------------------------
>
>                 Key: TIKA-2726
>                 URL: https://issues.apache.org/jira/browse/TIKA-2726
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>            Priority: Major
>             Fix For: 2.0.0, 1.21
>
>         Attachments: C46JNLYFA3HSONOL3IF6LTQO3ZO5JI65.xlsm, C46JNLYFA3HSONOL3IF6LTQO3ZO5JI65_1.18.json,
C46JNLYFA3HSONOL3IF6LTQO3ZO5JI65_1.19-pre-rc1.json
>
>
> With the move from Tika 1.18 to 1.19, we're now failing to detect some truncated ooxmls
more specifically than {{tika-ooxml}}.  In the attached example, which Excel is able to fix,
in 1.18, this file is identified as {{application/vnd.ms-excel.sheet.macroenabled.12}}, and
it is parsed without exception by the ooxml parser.  However, in 1.19, this is identified
as {{tika-ooxml}} and then parsed by the Package parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message