tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1039) Raw image file detected as audio/mpeg
Date Mon, 04 Feb 2013 17:06:12 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570390#comment-13570390

Nick Burch commented on TIKA-1039:

I'm not sure there's much that we can do about this, as the starting "magic" for MP3 files
are a little generic. It's possible we could try to add a second check for another frame header
of a similar type (VBR means the first two frame headers may differ) within a frame's size
distance, but the mime magic checking for that would be pretty icky given the current structure.
http://www.mars.org/pipermail/mad-dev/2002-January/000425.html indicates how many bytes an
audio frame could be (ID3 frames can be much much larger, and one of those could be the first)
> Raw image file detected as audio/mpeg
> -------------------------------------
>                 Key: TIKA-1039
>                 URL: https://issues.apache.org/jira/browse/TIKA-1039
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.2
>            Reporter: Oliver Boldt
>         Attachments: SimpleTestFile.raw
> A raw image file that starts with a long sequence of FFFF.... is recognised as audio/mpeg.
> The problem is that the raw file does not have a magic number itself and the FF...-pixeldata
is wrongly interpreted as an mpeg file. The bug seems to be a general problem, because other
image data could be misinterpreted as other magic numbers.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message