tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-366) Increase buffer size for mime type sniffing
Date Wed, 20 Jan 2010 02:01:54 GMT
Increase buffer size for mime type sniffing

                 Key: TIKA-366
                 URL: https://issues.apache.org/jira/browse/TIKA-366
             Project: Tika
          Issue Type: Improvement
          Components: mime
    Affects Versions: 0.5
         Environment: My local MacBook pro laptop.
            Reporter: Chris A. Mattmann
            Assignee: Chris A. Mattmann
             Fix For: 0.6

While working on TIKA-357 to address a similar problem for charset detection, I found an issue
with mime identification having to do with the same general problem. Tika right now only deals
with the first MimeTypes#getMinLength() bytes of a magic header to do the sniffing of mime
type. With the example file attached from Ken Krugler, it's clear that the current min length
size of 4 * 1024 bytes isn't enough. Extending it to 8K (8 * 1024 bytes) addresses this issue
and seems to open up more opportunity for mime detection at little overhead cost.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message