tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-6) Port Nutch (or better) MimeType detection system into Tika
Date Thu, 20 Sep 2007 14:40:31 GMT

    [ https://issues.apache.org/jira/browse/TIKA-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529120
] 

Chris A. Mattmann commented on TIKA-6:
--------------------------------------

Hi Bertrand:

Thanks for reviewing the patch. I should have made this more clear earlier, so I apoligize.
"Jerome" referenced in my prior comment is Jerome Charron. Jerome and I were the ones that
originally came up with the idea for Tika while working on the Nutch project, and we proposed
it to the list then. Jerome is one of the Nutch committers, as you noted above and had earlier
contributed a MimeType repository capability to Nutch, but had not updated it for a while.
Jerome had told me that he was working on a new version based on Freedesktop.org's mime system.

AFAIK, with respect to question #1, I think it's ok to distribute the freedesktop.org mime
database with our system because it's GPL.

With respect to question #2, I'm -1 for making the mime database a separate module. It was
intended to be one of the core functionalitie and contributions of the Tika library, and provides
much value added for a content analysis toolkit (mime detection is an integral part of it).
Jerome and I had originally thought about taking the mime DB (before Tika) to jakarta commons,
however, that idea evolved from providing a simple mime detection library to a full-fledged
content analysis toolkit library, that including mime detection, Tika.

> Port Nutch (or better) MimeType detection system into Tika
> ----------------------------------------------------------
>
>                 Key: TIKA-6
>                 URL: https://issues.apache.org/jira/browse/TIKA-6
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>         Environment: Improvement is indep. of environment
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.1-incubator
>
>         Attachments: TIKA-6.Mattmann.091907.patch.txt
>
>
> This patch will contribute a MimeType detection system for Tika, including MImeType data
structure, and associated content-detection facilities. This will be based on Nutch's MimeType
system as a baseline, however, I'm open to suggestions. Jerome Charron mentioned that he had
an implementation of a MimeType system based on FreeDesktop.org's system. We should look into
this as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message