ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian C. Hill" <>
Subject encoding problem with tokens (on Linux)
Date Wed, 19 May 2010 18:11:05 GMT
  I am having a weird problem with the encoding during token replacement 
(ant code below).

Details: CentOS 5, SunOS 5, Darwin 9 // ant 1.6/1.7 // java 1.3/1.5/1.6

I have a simple token file with a simple variable ('A') and a single 
Chinese character (广 or cat -v: M-eM-9M-?).

The template is just "Hello @A@."

The file command says that the tokens file is UTF8 and the templace file 

If specify UTF-8 or don't specify an encoding, the encoding gets messed 
up. The 3-byte chinese character gets replaced with 4 bytes which prints 
as nonsense. BUT, if I specify latin1 for the encoding, the chinese 
character is maintained properly. Note that I also tried putting a 
chinese character into the template as well to get the file command to 
see the template file as UTF-8, which didn't help the token replacement 
if I went back to UTF-8 encoding. Interestingly, the Chinese character 
in the template itself is maintained regardless of the copy encoding 
used (again, the token replacement still gets messed up under UTF-8).

I simply cannot explain this. What am I missing? If all of the files are 
seen as UTF-8 and the Chinese character indeed seems to be encoded as a 
3-byte UTF-8 character (as opposed to unicode), what does latin1 have to 
do with this?

Any clues will be appreciated!


<project name="TokenReplacement" default="replace.tokens" basedir="../.">
<property name="replace.dir" value="/tmp/ant-char-probl
<property name="tokens.file" value="/tmp/ant-char-probl
<target name="replace.tokens" >
<copy todir="${replace.dir}" overwrite="true" encoding="utf-8">
<fileset dir="${replace.dir}">
<include name="**/*.tmpl"/>
<filterset begintoken="@" endtoken="@">
<filtersfile file="${tokens.file}"/>
<mapper type="glob" from="*.tmpl" to="*"/>

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message