trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Birdsall <dave.birds...@esgyn.com>
Subject RE: About UCS2
Date Mon, 13 Nov 2017 02:35:23 GMT
Hi,

UCS2 and UTF8 are both encodings of the Unicode character set.

UCS2 is an older encoding. Each character is encoded as 16 bits. So, it does not contain all
of the Unicode character set. I've heard that it includes about 3000 of the most commonly
used Chinese characters. For historical reasons, the Trafodion metadata tables use UCS2 for
columns containing object names.

On the other hand, UTF8 is a newer encoding, and can encode the entire Unicode character set.
There are other encodings, such as UTF-16 and UTF-32, which Trafodion does not currently support.

So, there are characters in Unicode that can be represented in UTF8 that cannot in UCS2.

In terms of performance, it's a mixed bag. It depends on the data you are storing and what
you are doing with it. For example, Chinese characters typically are 3 bytes in UTF8, but
2 bytes in UCS2. But some of the less frequently used Chinese characters appear only in UTF8.

If you have a mix of ASCII data and Chinese characters stored in a column, the most efficient
character set will depend on the ratio of ASCII to Chinese characters.

If you are doing a lot of string operations such as SUBSTRING or POSITION, UCS2 is more efficient
since it is a fixed width encoding. (I can go directly to the 10th character, for example,
but in UTF8 one has to start at the beginning of the string and count characters.)

I hope this helps,

Dave



-----Original Message-----
From: Liu, Yuan (Yuan) [mailto:yuan.liu@esgyn.cn] 
Sent: Sunday, November 12, 2017 6:18 PM
To: dev@trafodion.incubator.apache.org
Subject: About UCS2

Hi Trafodioneers,

We Trafodion have three main charsets, thery are ISO88591, UTF8 and UCS2.

As I know, ISO88591 is the default charset when we define char/varchar, it is a single-byte
character set.
UTF8 is mainly used if we want to store UTF8 such as Chinese characters.
But what about UCS2? I have never used UCS2 before, what is the suitable case for UCS2?

Best regards,
Yuan


Mime
View raw message