tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: [OT????] java.lang.IllegalStateException: NoWriterSupplied: No writer supplied for serializer.
Date Sat, 15 Nov 2008 20:56:38 GMT
> Although, I wonder about not passing start/endDocument.  Seems like
> even when you are restricting to an XPath, you still have the start
> and end of a document.

I was wondering about this, too, when first looking through the code. But in
the superclass ContentHandlerDecorator these events are passed, in
MatchingContentHandler they are explicitely removed from delegation (by
supplying empty methods for start/endDocument()).

The original idea behind the MatchingContentHandler is to extract
DocumentFragments (like in DOM) or NodeSets (in XPath understanding). These
are not valid XML documents (may contain more than one root element or no
root element at all). To generate a valid XML document from it, you have to
manually add the header and footer of a complete document.

The MatchingContentHandler is only for redirecting parts (like NodeSets in
real XPath) of SAX events to specific handlers inside parsers, but not for
the complete processing pipeline of whole documents.

I suspect, the output of your handler pipeline is not a valid XML document
and some serializers may not be able to handle this correctly at all (I have
seen bug reports on xalan/xerces serialization about this in the past), so
do not do this without checking that the document fed to XML serializers is
conformant and complete!

Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
eMail: uwe@thetaphi.de

View raw message