tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <chris.mattm...@jpl.nasa.gov>
Subject Re: Getting started with Tika
Date Thu, 06 Dec 2007 23:11:22 GMT
Hi Michael,

Welcome to the Tika list! Glad that you are interested in Tika. I will take
the lead in sending a note to the google code project that the project has
been fully migrated to Apache.

The "current status" of the project is taken from:
http://wiki.apache.org/incubator/October2007, based on our October 2007
incubator report. I think that the message to take away is that, though in
its nascent stages, progress on Tika has recently taken shape and the
project is nearing a stable 0.1 release. There is active development (though
slowed recently), and issues are being created, and worked through in JIRA.

The best way to get started with Tika is to check out the unit tests right
now, available in:


They really do the best job right now of exercising the features of the API.
To get a good idea of what's been contributed/committed to the upcoming 0.1
release, check out the CHANGES.txt file:


In general, the mime type detector, metadata framework, and automatic
parsing framework are currently working. There is a near-stable Parser
interface and several implementations of the Parser that exist to handle MS
WORD files, MS Powerpoint, PDF, XML, plain text, etc. You can see the
generic Parser interface by going to:


Tika is also currently used within Nutch as the mime type detection
framework, since the commit of NUTCH-562 [1]. Checking out Nutch will give
you an idea of how the mime framework works.

If you have any further questions, please let me (and others on this list)
know. We'd love to help. Again, welcome!


[1] http://issues.apache.org/jira/browse/NUTCH-562

On 12/6/07 2:38 PM, "Michael Wechner" <michael.wechner@wyona.com> wrote:

> Hi
> I guess Tika has been fully "migrated" from
> http://code.google.com/p/tika/
> to
> http://incubator.apache.org/tika/index.html
> right? If so, then I would suggest to add a note to the Google Code site
> resp. close the project at Google (if possible).
> Also I wanted to ask what's best to get started with Tika?
> I tried to find some documentation, but didn't really find anything like
> "a first hops example".
> Thanks in advance for any pointers
> Michael

Chris Mattmann, Ph.D.
Cognizant Development Engineer
Early Detection Research Network Project
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

View raw message