tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-2104) Upgrade to a version of POI that fixes common bugs in macro extraction, when available
Date Mon, 03 Oct 2016 19:28:20 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Allison updated TIKA-2104:
------------------------------
    Attachment: newExceptionsInBDetails.xlsx
                newExceptionsInBByMimeTypeByStackTrace.xlsx

I ran our batch code against ~800k MSOffice files without swallowing exceptions from Macro
extraction.  I'm attaching the results.  We can use these to identify and prioritize fixing
exceptions.

> Upgrade to a version of POI that fixes common bugs in macro extraction, when available
> --------------------------------------------------------------------------------------
>
>                 Key: TIKA-2104
>                 URL: https://issues.apache.org/jira/browse/TIKA-2104
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>         Attachments: newExceptionsInBByMimeTypeByStackTrace.xlsx, newExceptionsInBDetails.xlsx
>
>
> On TIKA-2069, we found two bugs in POI that prevented the extraction of macros from MSOffice
files.  Let's use this issue to track fixes in POI.
> Current known bugs are POI:
> 60162
> 60158
> 59830
> 59858
> After we release Tika 1.14, let's remove the catch blocks in Tika and rerun against our
regression corpus to help identify the most common bugs and find new ones.
> As always, patches are welcome on POI!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message