tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Severtson (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-578) XMLParser ContentHandler: multiple endDocument calls
Date Wed, 22 Dec 2010 17:32:03 GMT
XMLParser ContentHandler: multiple endDocument calls

                 Key: TIKA-578
                 URL: https://issues.apache.org/jira/browse/TIKA-578
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.8
         Environment: N/A
            Reporter: Scott Severtson

When supplying a ContentHandler to a XMLParser instance, the ContentHandler's .endDocument()
method is called twice; once by the SAXParser created within XMLParser, once explicitly by
XMLParser itself. 

Sample code:
InputStream inputStream = ...
XMLParser parser = new DcXMLParser();
ParseContext context = new ParseContext();
Metadata metadata = new Metadata();

DOMResult result = new DOMResult();
TransformerHandler transformerHandler = ((SAXTransformerFactory) SAXTransformerFactory.newInstance()).newTransformerHandler();

parser.parse(inputStream, transformerHandler, metadata, context);

The following exception is produced:
	at java.util.Stack.peek(Stack.java:85)
	at java.util.Stack.pop(Stack.java:67)
	at com.sun.org.apache.xalan.internal.xsltc.trax.SAX2DOM.endDocument(SAX2DOM.java:143)
	at com.sun.org.apache.xml.internal.serializer.ToXMLSAXHandler.endDocument(ToXMLSAXHandler.java:181)
	at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerHandlerImpl.endDocument(TransformerHandlerImpl.java:231)
	at org.apache.tika.sax.ContentHandlerDecorator.endDocument(ContentHandlerDecorator.java:115)
	at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandler.java:212)
	at org.apache.tika.parser.xml.XMLParser.parse(XMLParser.java:71)

We have worked around the issue temporarily by passing in a ContentHandler that eats the first
.endDocument() call, and allows the second to go through. However, we believe XMLParser should
hide the extraneous .endDocument() call internally.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message