tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Tyler (JIRA)" <j...@apache.org>
Subject [jira] Updated: (TIKA-391) Intermittent errors detectig xls files
Date Wed, 24 Mar 2010 09:31:27 GMT

     [ https://issues.apache.org/jira/browse/TIKA-391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Simon Tyler updated TIKA-391:

    Attachment: MimeTypes.java

Attached is an updated version of MimeTypes.java based on the 0.6 code base. This is tested
and solves the problem. The resource name and content type hints now pick a match from the
returned list.

The only changes are the addition of the getMimeTypes method and it's usage in the detect

A fuller fix for this issue should probably address all the other forms of getMimeType. We
could also consider what happens if the two hints hit different magic matches. 


> Intermittent errors detectig xls files
> --------------------------------------
>                 Key: TIKA-391
>                 URL: https://issues.apache.org/jira/browse/TIKA-391
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.6
>            Reporter: Simon Tyler
>         Attachments: MimeTypes.java
> I am doing some testing of Tika 0.6 and noticed some odd results for the testEXCEL.xls
file included in the test suite. 
> 100 calls to the following code:
>             is = new BufferedInputStream(new FileInputStream(filename));
>             Metadata metadata = new Metadata();
>             metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
>             String type = tika.detect(is, metadata);
> Results in different matches as application/msword or application/vnd.ms-excel seemingly
at random.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message