cocoon-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pier Fumagalli <>
Subject Re: [Help]How can I use non-ascii file name?
Date Mon, 16 Aug 2004 09:57:35 GMT
On 12 Aug 2004, at 12:45, roy huang wrote:

> Hi,all:
>     Use reader to display jpg or gif is quite simple,like:
>    <map:match pattern="*.jpg">
>     <map:read mime-type="image/jpg" src="jpg/{1}.jpg" />
>    </map:match>
>    But if the file name is not ASCII but utf-8 or other encoding like 
> 花.jpg (simplified Chinese),the resolver didn't resolve the name 
> correctly,error occur:
> org.apache.cocoon.ResourceNotFoundException: Error during resolving of 
> the input stream: org.apache.excalibur.source.SourceNotFoundException: 
> file:/C:/My 
> Documents/IBM/wsad/workspace/PowerOA/WebContent/test/jpg/花.jpg 
> doesn't exist.
> How can I use non-ASCII file name in cocoon?I can't find any 
> description or help in wiki or archived mail list.
> Roy Huang

It appears indeed as a bug...

I have this sitemap snippet:

     <map:match pattern="谷*">
       <map:generate src="谷{1}.xml"/>
       <map:transform src="welcome.xslt">
         <map:parameter name="contextPath" 
       <map:serialize type="xhtml"/>

and a file on the disk called "谷理子.xml". Somewhere, when I make a 
request for "http://localhost:8888/谷理子", the whole thing goes 

Now, the URL is passed correctly, as I see that in the access log:

INFO    (2004-08-16) 10:26.36:538   [access] 
(/%e8%b0%b7%e7%90%86%e5%ad%90) main-3/CocoonServlet: '????????' 
Processed by Apache Cocoon 2.1.5 in 27 milliseconds.

The above-mentioned string's encoding in UTF-8 is, in fact, "E8 B0 B7 
E7 90 86 E5 AD 90", so, cocoon receives it correctly, but somehow it 
gets lost in the process.

Now, if I modify my itemap to

     <map:match pattern="tanisatoko">
       <map:generate src="谷理子.xml"/>
       <map:transform src="welcome.xslt">
         <map:parameter name="contextPath" 
       <map:serialize type="xhtml"/>

And I make a request to "http://localhost:8888/tanisatoko", the thing 
works perfectly. We can safely exclude the fact that it's the 
generation process.

Now, the _odd_ thing I noticed is that in those cases, I get an error 
of "PipelineNotFound", not a "ResourceNotFound", which means that the 
matcher seriously doesn't see that request.

Changing over the matcher to a 'regexp' matcher doesn't change, so, I 
bet it's the data we feed to the matcher.

Now, changing that matcher to 
the encoding, 
and running it again, I get my nice page correctly.

I bet that somewhere (I don't know where, but surely somewhere), the 
UTF-8 encoded URL converted into a string using the current locale 
(MacRoman on my system), or a default of "ISO-8859-1", before the 
string is actually given to the sitemap.

Not having the sources at hand at the moment, I can't do a quick build 
to put out some debugging instruction, but  you get the idea.


View raw message