tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <chris.mattm...@jpl.nasa.gov>
Subject Re: [Moved] was: (TIKA-6) Port Nutch (or better) MimeType detection system into Tika
Date Thu, 20 Sep 2007 15:51:02 GMT
Hi Bertrand,

> On 9/20/07, Chris Mattmann <chris.mattmann@jpl.nasa.gov> wrote:
>> ...I propose
>> that we remove the file freedesktop.org.xml, and then rename the DTD file
>> from freedesktop.org.dtd to mime.types.dtd. That way, we remove the
>> freedesktop.org specific stuff, and we're simply using the data model for
>> how to structure the mime database...
> If that's how Nutch did it I guess it is ok, but I'd like to have the
> opinion of others. I seem to remember recent ASF discussions about a
> similar case, I'll see if I can find them.

One last thing on this that came to my mind too. I just want to clarify: we
aren't using any "code" from the freedesktop.org shared-mime info system
directly. We are simply in TIKA-6, using the freedesktop.org mime db XML
file format. The code to parse/interact with that information was written by
Jerome and myself entirely under the auspices of our CLAs so IMO it's fair
game. Again, just to point out, this is consistent with our Apache projects
(e.g., Nutch), where an XML file format (e.g., Eclipse Plugin DTD) is
adopted, however the code to read/interact with the data was written
specifically for the particular Apache project (as is the case here).

Just an FYI. Others are also welcome to chime in. Thanks!


Chris Mattmann, Ph.D.
Cognizant Development Engineer
Early Detection Research Network Project

Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

View raw message