trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin DeYager <kevin.deya...@esgyn.com>
Subject RE: enhance TRANSLATE to support Chinese charset?
Date Tue, 05 Jan 2016 00:11:25 GMT
Hi Ming,

I am no expert in this area, but is GB18030 translation also needed /
desirable?

Regards,
- Kevin

-----Original Message-----
From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Monday, December 21, 2015 4:51 PM
To: dev@trafodion.incubator.apache.org
Subject: enhance TRANSLATE to support Chinese charset?

Hello,

Trafodion currently has a TRANSLATE function, which can do charset
conversion among ISO88591, SJIS, UCS2 and UTF8.
I would like to add GBK conversion into this function, it can help for data
loading sometimes. As we saw previously, source data are very typically
encoded in GB2312, especially in China, so we have to do a 'iconv' from GBK
to UTF8 before loading, if the data files are huge, it will take a some
time.
If TRANSLATE can support GBKTOUTF8, so that conversion can be done in one
step during the 'LOAD' SQL command. I think there are some other use cases
as well.

Do you feel this is worthy? If so, I would like to file a JIRA and can work
on it.

At first glance, I would like to propose several translate flavors:
GBKTOUTF8N : which will try to do conversion from GB2312 to UTF8, in case
there is an error during the conversion, return NULL, no SQL Error raised,
silently continue.
GBKTOUTF8O: try to do conversion from GB2312 to UTF8, in case there is an
error during the conversion, return the original string without any
conversion, no SQL Error raised, silently continue.
BGKTOUTF8: typical behavior, once there is a conversion error, raise a SQL
Error.

Thanks,
Ming

Mime
View raw message