commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leandro Reis <>
Subject Re: [io] support for additional character sets needed in ReversedLinesFileReader
Date Tue, 03 Mar 2015 00:02:28 GMT
On 2 March 2015 at 21:53, sebb wrote:

>>On 2 March 2015 at 20:00, Leandro Reis <> wrote:
>>Hi all,
>>I¹m working on a product that uses Commons IO via Jackrabbit Oak. In the
>>process of testing the launch of such product on Japanese Windows 2012
>>Server R2, I came across the following exception:
>>"( Encoding windows-31j is not
>>supported yet (feel free to submit a patch))"
>>windows-31j is the IANA name for Windows code page 932 (Japanese), and is
>>returned by Charset.defaultCharset(), used in
>> [0].
>>It looks like this issue could be addressed by adding a check for
>>³windows-31j² to ReversedLinesFileReader(final File file, final int
>>blockSize, final Charset encoding):
>>} else if(charset.equals(Charset.forName("windows-31j"))) {
>>     byteDecrement = 1;
>>Similar changes would be needed in order to support the Chinese
>>Simplified, Chinese Traditional, and Korean versions of the same OS (I¹m
>>checking what the corresponding encoding names are).
>>Can someone familiar with this area of the code confirm this looks like
>>the proper approach to addressing this?

>Can a newline byte ever appear as part of a multi-byte character in any
>of those encodings?
No. Sources:
- Japanese:
- Simplified Chinese:
- Korean:
- Traditional Chinese:

>> Leandro

View raw message