tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-289) Add magic byte patterns from file(1)
Date Sun, 01 Mar 2015 17:51:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342347#comment-14342347

Nick Burch commented on TIKA-289:

As of r1663136, you can now run the Tika CLI with the option {{--compare-file-magic=<dir>}}
to have the Tika mime types compared to a File(1) magic directory. This will report the mime
types known to File(1) but not Tika, and the ones that File(1) has magic but Tika doesn't,
plus some summary statistics

Hopefully others can use that soon-ish to add in some of the missing types, and missing magics
for known types. Longer term, we can use it to track when File(1) adds new types we might
want to add in too

> Add magic byte patterns from file(1)
> ------------------------------------
>                 Key: TIKA-289
>                 URL: https://issues.apache.org/jira/browse/TIKA-289
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Priority: Minor
>         Attachments: file-has-magic-tika-missing.txt, file-mimes-missing.txt
> As discussed in TIKA-285, the file(1) command comes with a pretty comprehensive set of
magic byte patterns. It would be nice to get those patterns included also in Tika.

This message was sent by Atlassian JIRA

View raw message