tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wechner <michael.wech...@wyona.com>
Subject Re: XHTML Bean and corresponding content handler
Date Mon, 10 Aug 2009 12:58:23 GMT
Michael Wechner schrieb: 
>> I would suggest using BodyContentHandler instead of
>> WriteOutContentHandler. You can use it just like
>> WriteOutContentHandler, but it only outputs the contents of the
>> <body/> section. See the --text option in TikaCLI or the ParsingReader
>> class for good examples.
> yes, I have seen the BodyContentHandler, but it means I have to 
> explicitely concatenate the title (and the other meta data), which is 
> not that much
> effort,

I am using now the BodyContentHandler and aggregate the rest of the 
metadata (title, keywords, description, etc.) and it works well,
but as pointing out below I think the WriteOutContentHandler is 
misleading and I think the behaviour should either be changed or 
deprecated (with a note that one should use the BodyContentHandler)


> but as said I think it defeats the purpose of the 
> WriteOutContentHandler ;-)
> Thanks for your explanations
> Michael
>> BR,
>> Jukka Zitting

View raw message