openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Herbert Duerr <...@apache.org>
Subject Re: Improvements of OUString
Date Tue, 03 Dec 2013 09:35:17 GMT
On 03.12.2013 09:13, Andre Fischer wrote:
> A developer who apparently wants to remain anonymous has added the
> function isEmpty() to the rtl::OUString class.  See
> main/sal/inc/rtl/ustring.hxx for not much more information.

Sorry for being too short. The full semantic for isEmpty() is:

"The method isEmpty() returns true if the string is empty. If the length 
of the string is one or two or three or any number bigger than zero then 
isEmpty() returns false."

I added isEmpty() to make it possible to cleanly express the check for 
an empty string. In our codebase there were quite a few constructs such as
	if( aString) {}
which were intended to mean
	if( aString.isEmpty()) {}
What's funny is that the old construct compiled but it did the wrong 
thing: The string was implicitly converted to a pointer to its elements 
and that pointer was then compared against NULL. For our OUString that 
pointer was always non-NULL though.

Please see issue 123068 for further problems caused by the implicit 
conversion of the OUString to a pointer to its elements. This dangerous 
conversion is now disabled. By making the method private all such 
problems will be found and prevented by the compiler. When we're 
confident that all has been found the operator can be removed completely.

> This in itself may not yet be very exciting but I hope that it is the
> first of several improvements to one of our most frequently used
> classes.  Sadly, we missed the opportunity to make some more substantial
> but incompatible changes for the 4.0 release. However, some changes that
> make OUString more accessible to new (and old) developers might include:
>
> - Make construction from string literal more straightforward.  At the
> moment you have to write
>      ::rtl::OUString("text", sizeof("text"), RTL_TEXTENCODING_ASCII_US)
>    or slightly shorter and safer
>      ::rtl::OUString::createFromAscii("text")

Allocating heap space, transcoding a literal string to this memory and 
deallocating it later when the string is deleted are quite wasteful 
operations. Especially when considering that the literal string is 
already there. It would be great if constructs such
	OUString( L"hello")
used the pointer to the UTF-16 literal directly instead of copying its 
contents around. The same applies for the OString(). The 'L' prefix is a 
Windows convention but C++11 has even more possibilities with its 
support for unicode string literals.

Also we shouldn't bother our main string classes with non-unicode 
support. Having external tooling for converting from/to other encodings 
is still needed though.

Looking over our string processing I'm confident that we could get along 
great with UTF-8 strings. Only when interfacing with other APIs an 
eventual conversion to UTF-16 would be needed.

And if we were using UTF-8 byte strings we could base them directly on 
the standard std::string.

> - Conversion back to char* is not much better
>      ::rtl::OUStringToOString(sOUStringVariable,
> RTL_TEXTENCODING_ASCII_US).getStr()

This awful construct could be made much simpler if our strings were 
always unicode (UTF-8/UTF-16/UTF-32).

> Do you have more ideas?

Using ideas from languages such as Python/Perl/Java for convenient and 
powerful string processing to replace the awkward string handling that 
is too often seen in our code base. E.g. having regexp enabled match() 
or search() methods would be a great start.

Herbert


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
View raw message