tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-6) Port Nutch (or better) MimeType detection system into Tika
Date Thu, 20 Sep 2007 16:08:31 GMT

    [ https://issues.apache.org/jira/browse/TIKA-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529135

Jukka Zitting commented on TIKA-6:

I don't think we can include the freedesktop.org.xml file in Tika. AFAIK the database originates
from http://www.freedesktop.org/wiki/Software/shared-mime-info and is distributed under the
GPL. What we can do instead is to provide a configuration option (with a reasonable default)
for a user to point Tika to the mime database file already available on a system. This way
we don't need to include the viral component within Tika releases.

Some of the source files have the following license header. I guess it's an oversight and
easily resolved.

    //Copyright (c) 2007, California Institute of Technology.
    //ALL RIGHTS RESERVED. U.S. Government sponsorship acknowledged.

Also, we should update the license headers to the latest version available at http://www.apache.org/legal/src-headers.html.

> Port Nutch (or better) MimeType detection system into Tika
> ----------------------------------------------------------
>                 Key: TIKA-6
>                 URL: https://issues.apache.org/jira/browse/TIKA-6
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>         Environment: Improvement is indep. of environment
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.1-incubator
>         Attachments: TIKA-6.Mattmann.091907.patch.txt
> This patch will contribute a MimeType detection system for Tika, including MImeType data
structure, and associated content-detection facilities. This will be based on Nutch's MimeType
system as a baseline, however, I'm open to suggestions. Jerome Charron mentioned that he had
an implementation of a MimeType system based on FreeDesktop.org's system. We should look into
this as well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message