tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-939) Windows Media Video file detected as Windows Media Audio
Date Wed, 13 Jun 2012 15:38:42 GMT

    [ https://issues.apache.org/jira/browse/TIKA-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294527#comment-13294527

Nick Burch commented on TIKA-939:

WMA and WMV use the same container format (ASF), so detecting them only based on mime magic
is tricky to do. Ideally, we want a container aware detector for these kinds of formats, much
as we already do for things like ZIP, OLE2 and Ogg.

In the absence of a proper ASF container format aware detector, we have to try to fudge things
based on looking for magic strings in a file with the ASF magic bytes at the front. There's
alas not an obvious serious of bytes we can look for that conclusively says "this is a video",
so for now we just look for the video codec names in the first 8kb. Your file used a different
codec, so wasn't found. The audio codec chosen was found, so Tika assumed it was audio.

I've added the string for this in r1349906, so detection now works properly for your file.
Longer term, we do need someone to do a proper ASF aware detector.
> Windows Media Video file detected as Windows Media Audio
> --------------------------------------------------------
>                 Key: TIKA-939
>                 URL: https://issues.apache.org/jira/browse/TIKA-939
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.1
>         Environment: Microsoft's Expression Encoder 4 SP1
>            Reporter: Emil Burzo
>            Priority: Minor
>             Fix For: 1.2
>         Attachments: test.wmv
> Attached file is detected as "audio/x-ms-wma" instead of "video/x-ms-wmv".
> Expected result:
> $ java -jar tika-app-1.1.jar -d test.wmv 
> video/x-ms-wmv
> Actual result:
> $ java -jar tika-app-1.1.jar -d test.wmv 
> audio/x-ms-wma

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message