tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject Contribute more code for TIKA
Date Sat, 15 Nov 2008 18:02:53 GMT
Hallo people,

As noted in my reply to the ODF thread, I think the SAX-design of TIKA is
really great. I submitted two patches for extension of TIKA. If you like my
work and you would like to get more SAX-enabled document parsers (like the
OpenXML for Office 2007) just let me know. I am rather new to your project
and your coding styles, but I hope, may patches look good for you. They
still need some JavaDocs but this I s astart.

If you are interested, I would also work with SVN directly (I created my
patches with SVN), and may commit my code directly. Just inform me about
that and if you would like to allow me that and how the workflows are.

I currently work in two other OpenSource projects in the core group:

- Inventor of http://www.panFMP.org (a Metadata Portal that uses Lucene). A
metadata portal based on that (http://sedis.iodp.org) needed fulltext
support, so I started to study TIKA, but I had problems with whitespace
during indexing and missing parsers for OpenXML and correct working ODF.
If you look into the source code of panFMP, you will see, that it is
sometimes using SAX and DOM intermixed (in one parser!), but using Commons
Digester for that. When I saw your MatchingContentHandler with streaming
XPath support, I was thinking of rewriting Digester code in panFMP! (You see
how cool your implementation is :-] ).

- Maintainer of the Sun Java System Webserver SAPI for PHP

Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
eMail: uwe@thetaphi.de

View raw message