tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-86) Support magic(5) files
Date Mon, 16 Jan 2012 17:11:40 GMT

     [ https://issues.apache.org/jira/browse/TIKA-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jukka Zitting resolved TIKA-86.

    Resolution: Won't Fix

Agreed with the points above, so resolving as Won't Fix. Let's follow up in separate issue
on more actionable tasks.

I looked at magic file parsing on a few occasions, but as noted most of the magic files around
there are targeted for human-readable output and don't contain very comprehensive or accurate
media type information. Matching such input to the needs of Tika seems more trouble than it's

That said, some of the more complicated detection rules (like the regexp patterns mentioned
above) could well be useful for Tika. I'd love to see contributions in that area! That would
allow us to mine some of the larger magic files for specific complex patterns for reuse in
our type database.
> Support magic(5) files
> ----------------------
>                 Key: TIKA-86
>                 URL: https://issues.apache.org/jira/browse/TIKA-86
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>            Reporter: Jukka Zitting
> Tika should have a parser for the magic(5) file format used by the file(1) command. Then
we could use existing magic rules from places like http://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf/magic.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message