ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alec Fernandez <>
Subject RE: encoding problem with tokens (on Linux)
Date Thu, 20 May 2010 12:52:41 GMT
I assumed that filterfiles had to be in properties file syntax but I do not know this to be
the case.  If this is true then I would expect your tokens.file has something that looks like:
(were 1234 the excaped unicode representation of your cat glyph)

If so then the token will be replaced correctly.  You explicitly set the outputencoding on
the copy so the char will be the utf-8 representation of the cat glyph.

You don't specify that your Linux box is set up to use utf-8 for it's locale settings. Can
we assume this to be the case?  Would the tool you use to inspect the target file be affected
by this?

>> -----Original Message-----
>> From: Brian C. Hill []
>> Sent: Wednesday, May 19, 2010 8:11 PM
>> To:
>> Subject: encoding problem with tokens (on Linux)
>>   I am having a weird problem with the encoding during token
>> replacement
>> (ant code below).
>> Details: CentOS 5, SunOS 5, Darwin 9 // ant 1.6/1.7 // java
>> 1.3/1.5/1.6
>> I have a simple token file with a simple variable ('A') and a single
>> Chinese character (广 or cat -v: M-eM-9M-?).
>> The template is just "Hello @A@."
>> The file command says that the tokens file is UTF8 and the templace
>> file
>> is ASCII.
>> If specify UTF-8 or don't specify an encoding, the encoding gets
>> messed
>> up. The 3-byte chinese character gets replaced with 4 bytes which
>> prints
>> as nonsense. BUT, if I specify latin1 for the encoding, the chinese
>> character is maintained properly. Note that I also tried putting a
>> chinese character into the template as well to get the file command to
>> see the template file as UTF-8, which didn't help the token
>> replacement
>> if I went back to UTF-8 encoding. Interestingly, the Chinese character
>> in the template itself is maintained regardless of the copy encoding
>> used (again, the token replacement still gets messed up under UTF-8).
>> I simply cannot explain this. What am I missing? If all of the files
>> are
>> seen as UTF-8 and the Chinese character indeed seems to be encoded as
>> a
>> 3-byte UTF-8 character (as opposed to unicode), what does latin1 have
>> to
>> do with this?
>> Any clues will be appreciated!
>> Brian
>> ----------------------------------------------------------------------
>> ---------------------------------------------
>> <project name="TokenReplacement" default="replace.tokens" basedir="../.">
>> <property name="replace.dir" value="/tmp/ant-char-problem"/>
>> <property name="tokens.file" value="/tmp/ant-char-problem/tokens"/>
>> <target name="replace.tokens" >
>> <copy todir="${replace.dir}" overwrite="true" encoding="utf-8">
>> <fileset dir="${replace.dir}">
>> <include name="**/*.tmpl"/>
>> </fileset>
>> <filterset begintoken="@" endtoken="@">
>> <filtersfile file="${tokens.file}"/>
>> </filterset>
>> <mapper type="glob" from="*.tmpl" to="*"/>
>> </copy>
>> </target>
>> </project>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

View raw message