trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Ming (Ming)" <ming....@esgyn.cn>
Subject Re: enhance TRANSLATE to support Chinese charset?
Date Tue, 05 Jan 2016 01:01:56 GMT
Hi, Kevin,

I didn't notice GB18030 before, but after some initial search, it seems a must to have feature,
so Trafodion should support it. I will mark it after the GBK support, we saw GBK in real customer
site, but not GB18030 yet, but we should assume wide requirement of GB18030 very soon.

Thanks,
Ming

-----邮件原件-----
发件人: Kevin DeYager [mailto:kevin.deyager@esgyn.com] 
发送时间: 2016年1月5日 8:11
收件人: dev@trafodion.incubator.apache.org
主题: RE: enhance TRANSLATE to support Chinese charset?

Hi Ming,

I am no expert in this area, but is GB18030 translation also needed / desirable?

Regards,
- Kevin

-----Original Message-----
From: Liu, Ming (Ming) [mailto:ming.liu@esgyn.cn]
Sent: Monday, December 21, 2015 4:51 PM
To: dev@trafodion.incubator.apache.org
Subject: enhance TRANSLATE to support Chinese charset?

Hello,

Trafodion currently has a TRANSLATE function, which can do charset conversion among ISO88591,
SJIS, UCS2 and UTF8.
I would like to add GBK conversion into this function, it can help for data loading sometimes.
As we saw previously, source data are very typically encoded in GB2312, especially in China,
so we have to do a 'iconv' from GBK to UTF8 before loading, if the data files are huge, it
will take a some time.
If TRANSLATE can support GBKTOUTF8, so that conversion can be done in one step during the
'LOAD' SQL command. I think there are some other use cases as well.

Do you feel this is worthy? If so, I would like to file a JIRA and can work on it.

At first glance, I would like to propose several translate flavors:
GBKTOUTF8N : which will try to do conversion from GB2312 to UTF8, in case there is an error
during the conversion, return NULL, no SQL Error raised, silently continue.
GBKTOUTF8O: try to do conversion from GB2312 to UTF8, in case there is an error during the
conversion, return the original string without any conversion, no SQL Error raised, silently
continue.
BGKTOUTF8: typical behavior, once there is a conversion error, raise a SQL Error.

Thanks,
Ming
Mime
View raw message