poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 48936] Writing specific sequence of strings to XSSFSheet results in malformed XML
Date Sun, 25 Apr 2010 13:03:38 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=48936

Yegor Kozlov <yegor@dinom.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #2 from Yegor Kozlov <yegor@dinom.ru> 2010-04-25 09:03:35 EDT ---
Quite an interesting bug. 
The problem is in the way XmlBeans detects and writes CDATA blocks. Luckily, we
can control this funny behavior.

I traced the problem down to the internal class TextSaver in XmlBeans: 

http://svn.apache.org/viewvc/xmlbeans/tags/2.3.0/src/store/org/apache/xmlbeans/impl/store/Saver.java?view=markup

The logic for detection CDATA starts at line #1286. The heuristic is quite
complex, but it turned out that it can be controlled with two options:

  XmlOptions#setSaveCDataLengthThreshold(int)
  XmlOptions#setSaveCDataEntityCountThreshold(int)

The default value of cdataEntityCountThreshold is 5 and the default value of
cdataLengthThreshold is 32. These values perfectly agree with Josh's
observations. 

According to the docs, XmlBeans will use CDATA if the following condition is
true:
    textLength > cdataLengthThreshold && entityCount >
cdataEntityCountThreshold

The combination of XmlOptions.setSaveCDataEntityCountThreshold(0) and
XmlOptions.setSaveCDataLengthThreshold(-1) will make every text CDATA. 

The combination of XmlOptions.setSaveCDataEntityCountThreshold(MAXLENGTH) and
XmlOptions.setSaveCDataLengthThreshold(-1) will detect CDATA only if the text
is longer than MAXLENGTH chars. I used the following values to disable CDATA
when saving sharedStrings.xml:

  XmlOptions options = new XmlOptions(DEFAULT_XML_OPTIONS);
  options.setSaveCDataLengthThreshold(1000000); 
  options.setSaveCDataEntityCountThreshold(-1);


I committed the fix in r937792. 

Existing code using POI-3.6 can be fixed as follows:

  XmlOptions options = POIXMLDocumentPart.DEFAULT_XML_OPTIONS;
  options.setSaveCDataLengthThreshold(1000000);
  options.setSaveCDataEntityCountThreshold(-1);

Add these lines before calling  workbook.write(out)

Regards,
Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message