tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-6) Port Nutch (or better) MimeType detection system into Tika
Date Fri, 21 Sep 2007 14:03:50 GMT

    [ https://issues.apache.org/jira/browse/TIKA-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529427

Chris A. Mattmann commented on TIKA-6:

Hi Jukka:

> Some of the files have an @author tag with Hari Kodungallur in addition to Jerome, do
we have some history on the extent of his involvement? 

Good question: I don't know much about the history of Hari, but Jerome may. Doing a bit of
research, the only code that has Hari's name in the author tags, are MimeType.java and MimeTypeException.java.
These files are more or less based on their Nutch versions, which were contributed by Jerome,
and covered by his CLA. The nutch versions are here:


As for the spaces, I've updated them to use Sun's style, 4 space chars for an indent. So,
when I commit the patch that part will be fixed too. As an FYI: do we want to adopt the Sun
convention across the board here? What do others think?

If I don't here anything else about the patch I will commit it sometime in the next few hours.



> Port Nutch (or better) MimeType detection system into Tika
> ----------------------------------------------------------
>                 Key: TIKA-6
>                 URL: https://issues.apache.org/jira/browse/TIKA-6
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>         Environment: Improvement is indep. of environment
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.1-incubator
>         Attachments: TIKA-6.Mattmann.091907.patch.txt, TIKA-6.Mattmann.092007.patch.txt
> This patch will contribute a MimeType detection system for Tika, including MImeType data
structure, and associated content-detection facilities. This will be based on Nutch's MimeType
system as a baseline, however, I'm open to suggestions. Jerome Charron mentioned that he had
an implementation of a MimeType system based on FreeDesktop.org's system. We should look into
this as well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message