tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1180) Matroska (mkv, mka, webm) Detector
Date Sat, 05 Oct 2013 18:33:41 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787291#comment-13787291

Nick Burch commented on TIKA-1180:

There is a Java library - https://github.com/Matroska-Org/jebml - but it's LGPL and apparently
no longer maintained. We may need to roll our own simple code based on the file format specs,
as we've ended up doing for things like MP3

> Matroska (mkv, mka, webm) Detector
> ----------------------------------
>                 Key: TIKA-1180
>                 URL: https://issues.apache.org/jira/browse/TIKA-1180
>             Project: Tika
>          Issue Type: New Feature
>          Components: detector
>    Affects Versions: 1.5
>            Reporter: Nick Burch
> Following the work on TIKA-1177, we now have mimetype entries for the various formats
which are based on the Matroska container (mkv, mka, webm etc). However, we are unable to
properly identify the specific type just from some mime magic
> Instead, for fully accurate detection, we'll need a new Detector for the Matroska family,
which does some very simple container/stream processing to work out what the container contains

This message was sent by Atlassian JIRA

View raw message