poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Spencer <pau...@apache.org>
Subject Re: DO NOT REPLY [Bug 49020] "org.xml.sax.SAXParseException: </b> does not close tag <br>." when opening some Excel 2007 files
Date Wed, 31 Mar 2010 12:50:49 GMT

See below.

On Mar 31, 2010, at 7:14 AM, bugzilla@apache.org wrote:

> https://issues.apache.org/bugzilla/show_bug.cgi?id=49020
> --- Comment #3 from Nick Burch <nick.burch@alfresco.com> 2010-03-31 11:14:53 UTC
> The bug is really with Excel here - it has generated a file with invalid XML.
> The xlsx file is defined as being made up of XML subparts, and the XML spec is
> very very strict on matching tags.
> For the long term, you should report a bug to Microsoft about this. They either
> need to sanitise the user input and sort out the tags (eg <br> becomes <br />),
> or they need to give up and escape the whole tag contents for the bits where
> iffy data could get added (eg put this textbox within a CDATA section)

I will report the but to Microsoft, but that does not address existing files.

> Short term, you could just comment out the code that reads in the vmlDrawing
> section of the file, and ensure that you don't touch the drawing records

Please expand on "just comment out the codes that reads the vmlDrawing section".  Since my
application supports many version of Excel, I use WorkbookFactory.create()   to read the file.

> Medium term, we should get a list of the problem bits that Excel does wrong,
> such as <br> (but perhaps others). Then, we need to write a XML Input Wrapper
> that cleans these up before they get passed to the XML Processor for loading.
> Something like this is quite nasty, though it's possible some other project out
> there has already done it, and we can just re-use what they do.

I like this as a solution.


Paul Spencer

To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org

View raw message