openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Herbert Duerr <...@apache.org>
Subject Re: Improvements of OUString
Date Tue, 03 Dec 2013 13:32:35 GMT
On 03.12.2013 13:02, Andre Fischer wrote:
> On 03.12.2013 10:35, Herbert Duerr wrote:
>> On 03.12.2013 09:13, Andre Fischer wrote:
>> [...]
>> "The method isEmpty() returns true if the string is empty. If the
>> length of the string is one or two or three or any number bigger than
>> zero then isEmpty() returns false."
>
> Additionally to this almost correct statement one could mention that
> isEmpty() is preferred over getLength()>0 and why.

Yes, it is preferred for checking the emptiness because it directly 
expresses what it checks.

In general it is a good idea to check for emptiness instead of counting 
the elements and then comparing against zero. Its the old "interface vs. 
implementation detail" question. The result will be the same from a 
mathematical standpoint but the effort to get this result may be 
different. From an algorithmic complexity standpoint an emptiness check 
is always equal or better. Maybe a mathematician can provide some 
insights from the set theory on this question?

By the way: the String class of Java>=6 got its isEmpty() method for the 
same reasons.

> Can you tell me what happens when an OUString is created for "\0". Is
> that handled as end-of-string or as just one additional character?

What happens during the string construction is unchanged. So if you were 
using the
	OString( "\0")
it did and does create a zero-length OUString. If you were using the 
constructor with length argument
	OString( "\0", 1)
then the length was and is 1, because 1 was provided as length argument.

Only a string without any elements is empty. A string with one or more 
elements is considered non-empty even if all its elements are zero. So 
if you used a test like aString.getLength()==0 before you can use 
aString.isEmpty() directly.

>> [...]
>> Also we shouldn't bother our main string classes with non-unicode
>> support. Having external tooling for converting from/to other
>> encodings is still needed though.
>
> We should drop our support for ASCII?

UTF-8 contains ASCII. This was one of its most important design goals 
and IMHO is a key factor that made this encoding such a big success.

Speaking of UTF-8 vs. ASCII I suggest to change the O*String methods 
such as createFromAscii() to createFromUtf8().

>>> [...]
>>>      ::rtl::OUStringToOString(sOUStringVariable,
>>> RTL_TEXTENCODING_ASCII_US).getStr()
>>
>> This awful construct could be made much simpler if our strings were
>> always unicode (UTF-8/UTF-16/UTF-32).
>
> I thought that OUString is UTF-16 and that that where the cause, not the
> solution of the conversion problems.

The complexity of the awful construct comes from the use of the general 
purpose machinery for an N:1 conversion (with N being the number of 
supported byte encodings). A 1:1 conversion (UTF-8 <-> UTF-16) is much 
simpler.

As I wrote I'd even like to go full UTF-8 inside AOO. Most of the back 
and forth transcodings between UTF-8 and UTF-16 inside AOO are just 
wasteful.

Herbert


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
View raw message