tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Second Tika report
Date Wed, 09 May 2007 15:59:30 GMT

I've prepared the following as the Tika report for this month.

Tika is a toolkit for detecting and extracting metadata and structured
text content from various documents using existing parser libraries.
Tika entered incubation on March 22nd, 2007.


We had a good project bootstrap meeting as a part of the text analysis
BOF at the ApacheCon EU in Amsterdam. The resulting ideas were
summarized on the project mailing list, and the first design threads
have started.


We've started discussing the design of the Tika toolkit. It seems like
we will select one of the existing codebases listed in the project
proposal as the basis of an early 0.1 release, and start refactoring
the code into a more generic toolkit. The Tika svn tree is still
empty, but I expect us to see the first code commits before the next


All the initial infrastructure is now in place. There is still some
activity on the temporary Tika wiki on the Google Project hosting
service, so we may end up requesting a Tika wiki to be set up on the
ASF infrastructure.

Issues before graduation

The Tika project is still at an early stage of incubation. The most
important tasks before graduation are to develop and release the Tika
codebase and to grow a diverse and sustainable project community.


Jukka Zitting

View raw message