openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Douglas Pitonyak <and...@pitonyak.org>
Subject Re: Improvements of OUString
Date Wed, 04 Dec 2013 06:26:25 GMT

On 12/03/2013 11:27 AM, Herbert Duerr wrote:
>
>>>> We should drop our support for ASCII?
>>>
>>> UTF-8 contains ASCII. This was one of its most important design goals
>>> and IMHO is a key factor that made this encoding such a big success.
>>> [...]
>>
>> Hm, UTF-8 is not identical to ASCII.  What if I want to write an
>> OUString to stdout?  Does a regular printf support UTF-8 or would I need
>> a conversion from UTF-8 to ASCII for that?
>
> If you have an ASCII string then you can directly print it in an UTF-8 
> locale. No conversion needed. Also the inverse is true: if that string 
> was encoded as UTF-8 then you can print it directly in an ASCII 
> compatible locale. No conversion needed for the output. The result 
> would be exactly the same.
>
> printf() and friends support the encoding defined by the LC_CTYPE 
> environment variable. Nowadays this is very very often UTF-8, which is 
> backward compatible with ASCII.
>
> Some encodings are not ASCII compatible though, e.g. EBCDIC or DBCS 
> (double-byte character sets). If you printed ASCII text in such 
> environments without converting them first then you'd get gibberish. 
> So if you want to make sure that what you want is what you get then 
> you should always convert to the local encoding as determined by 
> osl_getThreadTextEncoding().
>
> But ASCII and UTF-8 encodings are quite dominant nowadays, especially 
> on developer machines. While we could fix all debug-printing for 
> non-ASCII compatible environments I suggest not to invest too much 
> energy into such a task. The number of developers we'd win by 
> supporting e.g. EBCDIC based development environments vs. the 
> developer investment we'd have to spend to achieve this support would 
> most probably be negative.

I would have said that the ASCII values from 0 to 127 are the same for 
UTF-8, but, ASCII values greater than 127 are a problem. I recently had 
a problem with that when a documented contained ASCII 160, a 
non-breaking space. I became aware of it when I was asked "hey, why does 
this file look different after it was converted to UTF-8?"

-- 
Andrew Pitonyak
My Macro Document: http://www.pitonyak.org/AndrewMacro.odt
Info:  http://www.pitonyak.org/oo.php


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
View raw message