burton 2004/08/02 18:26:58
Modified: feedparser/src/java/org/apache/commons/feedparser
FeedParser.java
Log:
Fixed BAD bug in the FeedParser with UTF-8 encoding of content due to interrnal bug in JDOM
and the JDK...
Revision Changes Path
1.6 +10 -2 jakarta-commons-sandbox/feedparser/src/java/org/apache/commons/feedparser/FeedParser.java
Index: FeedParser.java
===================================================================
RCS file: /home/cvs/jakarta-commons-sandbox/feedparser/src/java/org/apache/commons/feedparser/FeedParser.java,v
retrieving revision 1.5
retrieving revision 1.6
diff -u -r1.5 -r1.6
--- FeedParser.java 3 Aug 2004 01:24:17 -0000 1.5
+++ FeedParser.java 3 Aug 2004 01:26:58 -0000 1.6
@@ -56,7 +56,13 @@
try {
- //Need to massage our XML support forfor UTF-8 to prevent
+ // Need to massage our XML support forfor UTF-8 to prevent the
+ // dreaded "Invalid byte 1 of 1-byte UTF-8 sequence" content bug in
+ // some default feeds. This was tested a great deal under
+ // NewsMonster and I'm happy with the results. Within FeedParser
+ // 2.0 we will be using SAX2 so this won't be as big of a problem.
+ // In FeedParser 2.0 (or as soon as we use SAX) this code should be
+ // totally removed to use the original stream.
byte[] bytes = toByteArray( is );
String encoding = XMLEncodingParser.parse( bytes );
@@ -75,6 +81,8 @@
is = new ByteArrayInputStream( bytes );
}
+ //OK. Now we have the right InputStream so we should build our DOM
+ //and exec.
DOMBuilder builder = new DOMBuilder();
org.jdom.Document doc = builder.build( is );
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
|