tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremias Maerki <...@jeremias-maerki.ch>
Subject Re: Support for document libraries
Date Tue, 10 Jul 2007 15:40:02 GMT
Adding document format libraries as subprojects of Tika still "hides"
them somewhat. So this wouldn't really solve the problem of easily
finding such libraries. If new libraries should be developed, I would
think that a lab or Commons is better suited.

There were many talks over the years about creating an image library
inside the ASF but it has never developed into a real effort. It's a lot
of work and with ImageIO built into the JDK only exotic wishes are still

If we had a Tika Wiki we could at least list potential existing libraries
and libraries that we'd like but don't exist. We could list licenses,
candidates for incubation, quality/maturity indicators...

Inside the XML Graphics project, we have the following available (if
anyone is interested to know):
* XMP metadata framework in XML Graphics Commons, read/write, work in
* PostScript DSC in XML Graphics Commons, read/write (no PS interpreter!)
* PNG and TIFF codecs in XML Graphics Commons, read/write
* PDF in FOP, write only
* RTF in FOP, write only
* SVG in Batik, read/write

PDF (PDFBox @SourceForge), read/write, signalled interest for incubation

personal wishlist:
ODF, read/write
Mars, read/write

On 10.07.2007 09:18:33 Carsten Ziegeler wrote:
> Afaik there is currently no central place at Apache where
> libraries/frameworks for handling of specific document formats are
> developed. We have single projects like poi of course.
> If you are searching for java libraries which support a specific format,
> like some image formats, you'll find many libraries of varying quality
> and it's really hard (if not impossible) to choose a correct one.
> I'm wondering if something could be done about it by starting a project
> at Apache which supports various file formats (like images, mp3 etc.) -
> perhaps by incubating some existing stuff.
> Although Tika is more the framework for plugin in such stuff, it perhaps
> makes sense to try to start something like that as sub projects of Tika?
> Carsten
> -- 
> Carsten Ziegeler
> cziegeler@apache.org

Jeremias Maerki

View raw message