tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-6) Port Nutch (or better) MimeType detection system into Tika
Date Sat, 03 Nov 2007 05:43:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539818

Chris A. Mattmann commented on TIKA-6:

Hi Guys,

 Thanks Thilo, Bertrand and Jukka for chiming in. I'm not exactly sure how
to proceed either since I'm better at developing code than sorting out legal
issues :)

 I can only speak from my analogous experience within Nutch here in which
the Eclipse.org DTD format, more or less, is the basis of the
(near-equivalent) Nutch plugin.xml format. Nutch has its own plugin.dtd
(that I helped to draft, along with Jerome CHarron), based on the Nutch
plugin system developed primarily by Stefan Groschupf. Essentially, Stefan
developed a set of code to interpret plugin.xml files, and do things with
their data (process it, reformat it, make it available, etc.) to other Nutch
subsystems. This entire code base was developed specifically for Nutch. The
only thing that was reused was the plugin DTD's "model" (perhaps modulo an
attribute field or two in one of the tags).

 Here, we have a very similar case. Jerome Charron has clearly used the
shared mime-info DTD format from freedesktop.org as the "data model" for his
mime type db format. However, the entire code-base that reads, interprets,
reformats, and processes the information made available by the mime type
database, was written by Jerome. Also, I'm not sure that Jerome's mime DTD
(and associated mime XML format) provided in TIKA-6 doesn't add some new
capability (e.g., I notice that the Nutch mimes are in this new mime db as
well) that was not present in the freedesktop.org original one.

 So, to me, and mind you, I have pretty much 0% experience interpreting OS
licenses, etc., as well as interpreting law :), this is an equivalent, and
acceptable case here. I guess what it really boils down to is IP on a "data
model". For instance, must any mime databases developed in the future that
contain at least the shared mime-info database data elements defined in the
DTD as a subset of their information, restricted to being GPL? I would be
scared/shocked if the answer to that is "yes". To me, it's hard to slap IP
issues onto a "data model". If that were the case, wouldn't any software
developed for an ISO data model standard (e.g., ISO-11179) be required to
adopt whatever the license is that the 11179 model standards body decided?

 My 2 cents,

Chris Mattmann, Ph.D.
Cognizant Development Engineer
Early Detection Research Network Project

Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

> Port Nutch (or better) MimeType detection system into Tika
> ----------------------------------------------------------
>                 Key: TIKA-6
>                 URL: https://issues.apache.org/jira/browse/TIKA-6
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>         Environment: Improvement is indep. of environment
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.1-incubator
>         Attachments: TIKA-6.Mattmann.091907.patch.txt, TIKA-6.Mattmann.092007.patch.txt
> This patch will contribute a MimeType detection system for Tika, including MImeType data
structure, and associated content-detection facilities. This will be based on Nutch's MimeType
system as a baseline, however, I'm open to suggestions. Jerome Charron mentioned that he had
an implementation of a MimeType system based on FreeDesktop.org's system. We should look into
this as well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message