commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristian Rosenvold <krosenv...@apache.org>
Subject [io] IBM JDK and broken UTF-16 (Related to release 2.5 RC 1)
Date Fri, 11 Dec 2015 09:23:18 GMT
I've been digging deeply into the IBM JDK 6/7 related breakages on IO RC
2.5.

A lot of them can be explained by different capabilities of XML parsers in
the different JDKs, and I have come up with a decent heuristic for
detecting this and ignoring the tests.

There are also a couple of legacy oddball character sets supported by the
IBM JDK that simply do not support round-tripping the french string in the
testcase. (Nerdy side note; Take a look at the 7-bit japanese/chinese
https://en.wikipedia.org/wiki/ISO/IEC_2022 !). These can just be excluded
from the testcase.

But the UTF-16 decoder in IBM JDK 6 and 7 is simply broken when fed
single-bytes at a time (it works fine with a full byte array input). This
is bad news for the WriterOutputStream, which is quite fundamentally based
on outputting single bytes. Where the other problems can be fixed by
improving the testcase, I really believe the  WriterOutputStream should
just throw UnsupportedOperationException on IBM JDK6/7 with UTF16.

WDYT ?

Kristian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message