ibatis-user-java mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Goodin <brandon.goo...@gmail.com>
Subject Re: Help needed...
Date Wed, 20 Apr 2005 15:56:32 GMT
I did work with a japanese site and we used Shift_JIS which is a UTF-8
extension. We would store Shift_JIS into the database but then we had
some issues reading the stored data from the database. The characters
were entered as Shift_JIS and stored as UCS-2 (UTF-16) in SQL Server.
We tried reading them straight from the database and displaying them
on screen without any byte encoding conversion. But, they wound up
looking all wrong. The browser did not handle the conversion properly.
We then read the data from the database and used the java
String.getBytes(String charSetName) method to reset the encoding.
However, the java String.getBytes method did not work properly. We
wound up writing our own conversion that was quite simple and
everything worked. So, as far as i know, all the glyph representations
that are available in UTF-8 are available to UTF-16 and it is possible
to convert back and forth between the two so long as a glyph does not
exceed UTF-8 glyph storage size. But, I think UTF-16 has the potential
to store more complex glyphs. Maybe i'm wrong. But, that is my
impression with all of this.

Brandon

On 4/20/05, Miquel Angel Bada Zuazo <mabada@gmail.com> wrote:
> UTF-8 is for almost all languajes (uses 8 bits for representing a
> letter I think), but "complicated" languajes as Japanese and Thailand
> uses 16 bits, so that's because of UTF-16 overall.
> 
> Miquel Angel
> 
> On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com> wrote:
> > I've done quite a bit with i18n working between UTF-8 and UTF-16. Even
> > after all that... I'm still mystified. :D Encoding is a world unto
> > itself. All i want is something that works :) Maybe one of these days
> > i'll understand more... for now it's all about trial and error.
> >
> > On 4/20/05, Brice Ruth <bdruth@gmail.com> wrote:
> > > I don't see anywhere in there that UTF-8 cannot encode everything that
> > > UTF-16 and UTF-32 can ... just that the storage requirements differ ?!
> > >
> > > Brice
> > >
> > > On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com> wrote:
> > > > http://icu.sourceforge.net/docs/papers/forms_of_unicode/
> > > >
> > > > On 4/20/05, Brice Ruth <bdruth@gmail.com> wrote:
> > > > > I had heard that chinese does a lot with UTF-16, but I hadn't heard
> > > > > about arabic ... and I don't exactly understand why UTF-8 doesn't
> > > > > support that ... is it simply because their character sets keep
> > > > > expanding and UTF-8 is static?
> > > > >
> > > > > On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com> wrote:
> > > > > > Latin characters are fine. Howeve, UTF-8 is not sufficient for
several
> > > > > > languages like Arabic and Chinese. For their FULL range of character
> > > > > > representaions these languages require UTF-16 and in the case
of
> > > > > > Chinese it is pushing for UTF-32.
> > > > > >
> > > > > > Brandon
> > > > > >
> > > > > > On 4/20/05, Brice Ruth <bdruth@gmail.com> wrote:
> > > > > > > OK ... that's more reasonable. Obviously, you need to use
an editor
> > > > > > > (such as Eclipse) that is capable of editing UTF-8 files,
otherwise,
> > > > > > > you'll get junk and that won't be fun.
> > > > > > >
> > > > > > > Whew ... glad UTF-8 isn't compromised :)
> > > > > > >
> > > > > > > On 4/20/05, Brandon Goodin <brandon.goodin@gmail.com>
wrote:
> > > > > > > > I found this quote when doing a search in google:
> > > > > > > >
> > > > > > > > --- quote ---
> > > > > > > >
> > > > > > > > Your actual problem is very typical. By default (without
encoding
> > > > > > > > specified in the XML declaration), XML is encoded
in UTF-8. If you use
> > > > > > > > an editor which is not encoding-aware and typically
assuming an
> > > > > > > > ISO-8859-1 encoding, and you insert characters such
as accented
> > > > > > > > letters, curly quotes, etc., you will get this error.
As a workaround,
> > > > > > > > you can put an XML declaration with the ISO-8859-1
encoding at the top
> > > > > > > > of your XML file:
> > > > > > > >
> > > > > > > > <?xml version="1.0" encoding="ISO-8859-1"?>
> > > > > > > >
> > > > > > > > You can also use an editor which knows how to handle
UTF-8.
> > > > > > > >
> > > > > > > > In your case it is also possible that somebody inserted
incorrect
> > > > > > > > characters by accident, and you can just remove those
and then decide
> > > > > > > > which encoding you want to use. UTF-8 gives you the
whole range of
> > > > > > > > Unicode, while ISO-8859-1 gives you a limited set
of characters that
> > > > > > > > work for the Western languages.
> > > > > > > >
> > > > > > > > --- quote ---
> > > > > > > >
> > > > > > > > maybe that will help,
> > > > > > > > Brandon
> > > > > > > >
> > > > > > > > On 4/20/05, Brice Ruth <bdruth@gmail.com> wrote:
> > > > > > > > > What special characters aren't supported by UTF-8?!
I have never heard
> > > > > > > > > of such a thing. My understanding is that UTF-8
represents the full
> > > > > > > > > Unicode character set as a multi-byte value.
And since Unicode is
> > > > > > > > > supposed to encompass all known characters for
all known languages
> > > > > > > > > (with space for new Chinese characters created
daily) - what's not
> > > > > > > > > covered?!
> > > > > > > > >
> > > > > > > > > There most certainly shouldn't be anything that
iso-8859-1 or latin1
> > > > > > > > > (Windows-1252) covers that is not in Unicode.
> > > > > > > > >
> > > > > > > > > Brice
> > > > > > > > >
> > > > > > > > > On 4/20/05, Daniel H. F. e Silva <dhfs@yahoo.com>
wrote:
> > > > > > > > > > You could check also your xml encoding.
If you work with special charaters not in utf-8, you will
> > > > > > > > > > get in trouble.
> > > > > > > > > > I had this as my native language is portuguese
and we have some special characters not supported
> > > > > > > > > > by utf-8.
> > > > > > > > > > So, if this is your case, try iso-8859-1
or one that fits better to your needs.
> > > > > > > > > >
> > > > > > > > > > Cheers,
> > > > > > > > > >  Daniel Silva.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --- Larry Meadors <larry.meadors@gmail.com>
wrote:
> > > > > > > > > > > Make sure that there is no white space
and no odd chars at the top of your
> > > > > > > > > > > config file.
> > > > > > > > > > >
> > > > > > > > > > > Larry
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On 4/18/05, KK <kkn006@gmail.com>
wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I get the following error when
I try to build sqlCOnfigmap..does it
> > > > > > > > > > > > look familiar to someone?
> > > > > > > > > > > >
> > > > > > > > > > > > com.ibatis.sqlmap.client.SqlMapException:
There was an error while
> > > > > > > > > > > > building the SqlMap instance.
> > > > > > > > > > > > --- The error occurred in the
SQL Map Configuration file.
> > > > > > > > > > > > --- Cause: com.ibatis.sqlmap.client.SqlMapException:
XML Parser Error.
> > > > > > > > > > > > Cause: java.io.UTFDataFormatException:
Invalid byte 3 of 3-byte UTF-8
> > > > > > > > > > > > sequence.
> > > > > > > > > > > > Caused by: java.io.UTFDataFormatException:
Invalid byte 3 of 3-byte
> > > > > > > > > > > > UTF-8 sequence.
> > > > > > > > > > > > Caused by: com.ibatis.sqlmap.client.SqlMapException:
XML Parser Error.
> > > > > > > > > > > > Cause: java.io.UTFDataFormatException:
Invalid byte 3 of 3-byte UTF-8
> > > > > > > > > > > > sequence.
> > > > > > > > > > > > Caused by: java.io.UTFDataFormatException:
Invalid byte 3 of 3-byte
> > > > > > > > > > > > UTF-8 sequence.
> > > > > > > > > > > > at com.ibatis.sqlmap.engine.builder.xml.XmlSqlMapClientBuilder.buildSqlMap
> > > > > > > > > > > > (XmlSqlMapClientBuilder.java:203)
> > > > > > > > > > > > at com.ibatis.sqlmap.client.
> > > > > > > > > > > > SqlMapClientBuilder.buildSqlMapClient(SqlMapClientBuilder.java:49)
> > > > > > > > > > > >
> > > > > > > > > > > > Your help is greatly appreciated.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > KK
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > __________________________________________________
> > > > > > > > > > Do You Yahoo!?
> > > > > > > > > > Tired of spam?  Yahoo! Mail has the best
spam protection around
> > > > > > > > > > http://mail.yahoo.com
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Brice Ruth
> > > > > > > > > Software Engineer, Madison WI
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Brice Ruth
> > > > > > > Software Engineer, Madison WI
> > > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Brice Ruth
> > > > > Software Engineer, Madison WI
> > > > >
> > > >
> > >
> > > --
> > > Brice Ruth
> > > Software Engineer, Madison WI
> > >
> >
>

Mime
View raw message