tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2194) matlab files detected as 'text/plain'
Date Mon, 12 Dec 2016 06:01:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741077#comment-15741077

Nick Burch commented on TIKA-2194:

Matlab files lack a unique magic pattern at the start, which makes mime magic detection rather
tricky. You really need to pass in the filename too, in order for Tika to have a good chance
of returning the right mimetype

> matlab files detected as 'text/plain'
> -------------------------------------
>                 Key: TIKA-2194
>                 URL: https://issues.apache.org/jira/browse/TIKA-2194
>             Project: Tika
>          Issue Type: Bug
>          Components: detector, mime
>    Affects Versions: 1.9, 1.14
>            Reporter: Mihai Glont
> matlab files from https://issues.apache.org/jira/browse/TIKA-1634 are reported to have
mime type 'text/plain' with either DefaultDetector or MimeTypes. I am able to reproduce the
problem by running the following Groovy script https://gist.github.com/mglont/16630c8a66fdddaaa7aa44820d6f021f

This message was sent by Atlassian JIRA

View raw message