tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (TIKA-366) Increase buffer size for mime type sniffing
Date Wed, 20 Jan 2010 02:05:54 GMT

     [ https://issues.apache.org/jira/browse/TIKA-366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris A. Mattmann resolved TIKA-366.

    Resolution: Fixed

- fixed in r901033

> Increase buffer size for mime type sniffing
> -------------------------------------------
>                 Key: TIKA-366
>                 URL: https://issues.apache.org/jira/browse/TIKA-366
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 0.5
>         Environment: My local MacBook pro laptop.
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.6
> While working on TIKA-357 to address a similar problem for charset detection, I found
an issue with mime identification having to do with the same general problem. Tika right now
only deals with the first MimeTypes#getMinLength() bytes of a magic header to do the sniffing
of mime type. With the example file attached from Ken Krugler, it's clear that the current
min length size of 4 * 1024 bytes isn't enough. Extending it to 8K (8 * 1024 bytes) addresses
this issue and seems to open up more opportunity for mime detection at little overhead cost.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message