openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Fischer <>
Subject Re: Improvements of OUString
Date Tue, 03 Dec 2013 14:37:22 GMT
On 03.12.2013 14:32, Herbert Duerr wrote:
> On 03.12.2013 13:02, Andre Fischer wrote:
>> On 03.12.2013 10:35, Herbert Duerr wrote:
>>> On 03.12.2013 09:13, Andre Fischer wrote:
>>> [...]
>>> "The method isEmpty() returns true if the string is empty. If the
>>> length of the string is one or two or three or any number bigger than
>>> zero then isEmpty() returns false."
>> Additionally to this almost correct statement one could mention that
>> isEmpty() is preferred over getLength()>0 and why.
> Yes, it is preferred for checking the emptiness because it directly 
> expresses what it checks.
> In general it is a good idea to check for emptiness instead of 
> counting the elements and then comparing against zero. Its the old 
> "interface vs. implementation detail" question. The result will be the 
> same from a mathematical standpoint but the effort to get this result 
> may be different. From an algorithmic complexity standpoint an 
> emptiness check is always equal or better. Maybe a mathematician can 
> provide some insights from the set theory on this question?
> By the way: the String class of Java>=6 got its isEmpty() method for 
> the same reasons.

Can you add some of this to the documentation of isEmpty()? (maybe don't 
mention set theory)

>> Can you tell me what happens when an OUString is created for "\0". Is
>> that handled as end-of-string or as just one additional character?
> What happens during the string construction is unchanged. So if you 
> were using the
>     OString( "\0")
> it did and does create a zero-length OUString. If you were using the 
> constructor with length argument
>     OString( "\0", 1)
> then the length was and is 1, because 1 was provided as length argument.
> Only a string without any elements is empty. A string with one or more 
> elements is considered non-empty even if all its elements are zero. So 
> if you used a test like aString.getLength()==0 before you can use 
> aString.isEmpty() directly.
>>> [...]
>>> Also we shouldn't bother our main string classes with non-unicode
>>> support. Having external tooling for converting from/to other
>>> encodings is still needed though.
>> We should drop our support for ASCII?
> UTF-8 contains ASCII. This was one of its most important design goals 
> and IMHO is a key factor that made this encoding such a big success.
> Speaking of UTF-8 vs. ASCII I suggest to change the O*String methods 
> such as createFromAscii() to createFromUtf8().

Hm, UTF-8 is not identical to ASCII.  What if I want to write an 
OUString to stdout?  Does a regular printf support UTF-8 or would I need 
a conversion from UTF-8 to ASCII for that?  If so, it would be 
convenient to have that directly at OUString, not in some external library.

>>>> [...]
>>>>      ::rtl::OUStringToOString(sOUStringVariable,
>>> This awful construct could be made much simpler if our strings were
>>> always unicode (UTF-8/UTF-16/UTF-32).
>> I thought that OUString is UTF-16 and that that where the cause, not the
>> solution of the conversion problems.
> The complexity of the awful construct comes from the use of the 
> general purpose machinery for an N:1 conversion (with N being the 
> number of supported byte encodings). A 1:1 conversion (UTF-8 <-> 
> UTF-16) is much simpler.

I think you are mixing up two concepts here.   One is the ability to 
convert an OUString to/from all text encodings defined 
sal/in/rtl/textenc.h.  The other is a possible replacement of the 
OUString implementation of UTF-16 with UTF-8.  As long as we don't drop 
the ability to convert between text encodings we will have a 1:N (N 
being 94 if I counted correctly) relationship which then can be used to 
realize a N:N relationship for arbitrary conversion between encodings (I 
am not sure that that really works).  If we change the implementation of 
OUString we will have a 1:1 relationship, regardless of which new 
encoding/format we use.

> As I wrote I'd even like to go full UTF-8 inside AOO. Most of the back 
> and forth transcodings between UTF-8 and UTF-16 inside AOO are just 
> wasteful.

I agree


> Herbert
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message