tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "robert burrell donkin" <robertburrelldon...@gmail.com>
Subject Re: shared MIME info
Date Sun, 08 Jul 2007 18:46:51 GMT
On 7/8/07, Chris Mattmann <chris.mattmann@jpl.nasa.gov> wrote:
> Hi Robert,

hi chris

> Yes, in fact it is. I am currently working on porting an implementation
> similar to that of freedesktop.org, original intended for the Nutch project.
> It was written by Jerome Charron. I've created an issue in TIKA's jira issue
> tracker to discuss this:
> http://issues.apache.org/jira/browse/TIKA-
>  I've been sitting on the code for a few weeks now, as I just haven't had
> the time to make much progress porting it. It shouldn't take too much effor
> though. Please let me know if you'd like to work on helping to port it.

quite possibly :-)

seems like we might have a good match but let me explain why i think
there might be some synergy...

RAT is a project comprehension tool built to help me review releases
in the incubator. it works by guessing meta-data from documents and
then analysing or reporting on it. checking headers and licenses, that
sort of thing.

it started out as a hacked together tool for myself but it's started
to become reasonably well used within apache. so, it's about time that
it moved on.

i'd like to start running RAT against all the incubator source. this
should make it easier for incubating projects to cut releases and
allow easier oversight of the code base.

RAT has a very basic set of heuristics for determining broad MIME
type. this really needs to be replaced by something better.

- robert

View raw message