tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ray Gauss II (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-1177) Add Matroska (mkv, mka) format detection
Date Fri, 04 Oct 2013 19:04:42 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ray Gauss II resolved TIKA-1177.

       Resolution: Fixed
    Fix Version/s: 1.5

Unfortunately that magic doesn't seem to be required in all MKV files.  I tired several utilities
to convert various sources to MKV and none contained that magic.

A magic value of {{0x1A45DFA3}} is present, but that's also present in WebM  which is extended
from Matroska.

I've added Matroska mime-types based on just extension for now and also added the WebM mime-type.

We can open other issues, linked to this one, for data detection of MKV and WebM files if
need be.

Resolved in r1529260.

> Add Matroska (mkv, mka) format detection
> ----------------------------------------
>                 Key: TIKA-1177
>                 URL: https://issues.apache.org/jira/browse/TIKA-1177
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.4
>            Reporter: Boris Naguet
>            Assignee: Ray Gauss II
>            Priority: Minor
>             Fix For: 1.5
> There's no mimetype detection for Matroska format, although it's a popular video format.
> Here is some code I added in my custom mimetypes to detect them:
> {code}
> 	<mime-type type="video/x-matroska">
> 		<glob pattern="*.mkv" />
> 		<magic priority="40">
> 			<match value="0x1A45DFA3934282886d6174726f736b61" type="string" offset="0" />
> 		</magic>
> 	</mime-type>
> 	<mime-type type="audio/x-matroska">
> 		<glob pattern="*.mka" />
> 	</mime-type>
> {code}
> I found the signature for the mkv on: 
> http://www.garykessler.net/library/file_sigs.html
> I was not able to find it clearly for mka, but detection by filename is still useful.
> Although, the full spec is available here:
> http://matroska.org/technical/specs/index.html
> Maybe it's a bit more complex than this constant magic, but it works on my tests files.

This message was sent by Atlassian JIRA

View raw message