tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.com>
Subject XHTML Bean and corresponding content handler
Date Tue, 04 Aug 2009 07:30:17 GMT

I am currently thinking about developing an XHTML bean and corresponding 
content handler along the lines of

Parser.parse(InputStream, XHTMLBeanContentHandler(XHTMLBean), Metadata);

String XHTMLBean.getHead().getMeta(XHTMLBean.DESCRIPTION)
String XHTMLBean.getHead().getTitle()
String[] XHTMLBean.getBody().getParagraphs();

I think it would simplify content retrieval, whereas I understand it 
doesn't scale well memory-wise with large XHTML documents.

Or has anyone done something similar yet? If not and if the devs 
consider this a good idea, then I would be happy to contribute this code.



View raw message