tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-1651) Add mime detection (and parsing?) for Microsoft Chart object
Date Mon, 04 Apr 2016 19:26:25 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Tim Allison resolved TIKA-1651.
    Resolution: Fixed

Duplicate issue.

> Add mime detection (and parsing?) for Microsoft Chart object
> ------------------------------------------------------------
>                 Key: TIKA-1651
>                 URL: https://issues.apache.org/jira/browse/TIKA-1651
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>         Attachments: 11.xls, 428996.ppt, embedded_xls_stack_traces.csv
> With recently modified tika eval dev code that captures exceptions from embedded documents,
there are ~30k exceptions in govdocs1 for what we're currently identifying as xls files embedded
in ppt and xls files. 
> It turns out that these are Microsoft Chart files/objects.  We are currently identifying
them as xls.  Let's add mime detection to these embedded objects and see if we can use POI
to parse the contents of embedded tables when there are embedded tables.

This message was sent by Atlassian JIRA

View raw message