tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-95) Pluggable magic header detectors
Date Fri, 20 May 2011 16:54:48 GMT

     [ https://issues.apache.org/jira/browse/TIKA-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jukka Zitting resolved TIKA-95.
-------------------------------

    Resolution: Duplicate
      Assignee: Jukka Zitting

This got implemented as a part of TIKA-447, so resolving as a duplicate.

> Pluggable magic header detectors
> --------------------------------
>
>                 Key: TIKA-95
>                 URL: https://issues.apache.org/jira/browse/TIKA-95
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>
> Some file formats like MS Office files or specific XML schemas don't have simple magic
marker bytes that could be used to easily identify the type of the document. However, it would
in many cases be possible to detect such formats with more complex parsing logic.
> Also, there are some external libraries (like Sanselan as mentioned in TIKA-92) that
contain their own magic header rules. Instead of duplicating such rules in Tika, it would
be better if Tika could just invoke the existing external functionality.
> To support these cases Tika should provide a mechanism to plug in custom magic header
detector components in addition to the traditional configured magic patterns.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message