velocity-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <ge...@optonline.net>
Subject Re: template encodings
Date Sun, 15 Jul 2001 16:15:55 GMT
Jonathan Revusky wrote:
> 
> "Geir Magnusson Jr." wrote:
> >
> > Jonathan Revusky wrote:
> > >
> > > "Geir Magnusson Jr." wrote:
> > > >
> > > > Jonathan Revusky wrote:
> > > > >
> > > > > I have the Velocity developer guide doc in front of me and I'm looking
> > > > > at the section entitled "Template Encoding for Internationalization".
> > > > >
> > > > > There is a method called mergeTemplate that takes the encoding as
an
> > > > > argument.
> > > > >
> > > > > My belief is that once you figure out the desired locale, and call
> > > > > response.setLocale(myLocale) that simply by writing to the Writer
object
> > > > > returned by response.getWriter() that it will (should) be encoded
> > > > > appropriately.
> > > >
> > > > I am not sure what you are getting at - but note that the encoding
> > > > argument to mergeTemplate() is the encoding of the *template*, not the
> > > > output.  The encoding of the output stream is something else entirely.
> > >
> > > Are you sure you meant to say the above? That mergeTemplate AFAICS is
> > > clearly used to output. You already have a read pre-parsed template
> > > object at that stage.
> > >
> >
> > Pretty sure, as I wrote the code... :)
> >
> > public static boolean mergeTemplate( String templateName,
> >                                          Context context, Writer writer
> > )
> >
> > public static boolean mergeTemplate( String templateName, String
> > encoding,
> >                                       Context context, Writer writer )
> >
> > mergeTemplate() takes a template name, an encoding [in the second
> > method] to specify the encoding of the template, a context, and a writer
> > to output to.
> >
> > The first version, w/o the encoding argument, is implemented as
> >
> > return mergeTemplate( templateName,
> > Runtime.getString(INPUT_ENCODING,ENCODING_DEFAULT),
> >                                context, writer );
> >
> > so you can see that it will try to get the property INPUT_ENCODING, and
> > use the Velocuty-provided default if not found.
> >
> > Note that these two methods have *nothing* to do with servlets at all.
> > We don't care *where* in your application the writer comes from...
> >
> > As for the 'pre-parsed' template, you are confusing it with the
> >
> > getTemplate() method, which also has a version that takes an encoding
> > argument (otherwise uses the default).
> >
> > When you use the Template object returned by getTemplate(), then you are
> > right, you have a pre-parsed template and the merge is
> >
> > template.merge( context, writer );
> >
> > no encoding needed in either direction, because the template itself is
> > already converted and parsed, and the writer handles the output
> > encoding...
> >
> > > But I think I understand the issue now anyhow.
> > >
> > > It seems to me that this kind of helper method is necessary, since you
> > > aren't always using velocity from within a servlet engine. That had
> > > slipped my mind.
> >
> > It's *always* necessary, even in a servlet engine - you need to declare
> > the encoding used for the template - template can be in any encoding you
> > choose to work in.
> >
> > Again, this encoding has *nothing* to do with output.
> 
> Nothing at all to do with it? I mean, as a purely theoretical matter,
> you're probably right.  But if have a page encoded in ISO-8859-1, what is
> the likelihood that I will output this to a client using a different
> character encoding, like Arabic or Chinese? Surely, in practice, there
> is near-perfect correlation between the input encoding and the output
> encoding...
> 

You aren't getting what I am saying : 

The input encoding has nothing to do with the output encoding.  The
input encoding is specified  to convert a bytestream that is the
template into a set of characters that can be parsed.  This is what
Velocity needs and cares about.

The output encoding is how your writer converts the rendered
characterstream into a bytestream for ouput. This configuration of the
writer is what you the application programmer do, and Velocity doesn't
care.

Velocity does have a configuration property OUTPUT_ENCODING, but this is
only used by the VelocityServlet convenience class and Anakia to have a
default encoding with which to configure the Writer.

With regards to 'in practice', that's only because people tend to work
in the encoding that is native to their locale or audience.  But since
Velocity does decode every template from *some* encoding into
characters, it's irrelevant what you encode in.


> >
> > > >
> > > > mergetTemplate() is a utility method provided by the Velocity helper
> > > > class to allow you to easily render a template.  The usual mechanism is
> > > > to use the pattern of
> > > >
> > > > Template t = getTemplate( name, encoding );
> > >
> > > This one, I was not using in my code. Now I have patched it though. I
> > > assume that a template found via the name_zh_TW.html lookup scheme is
> > > encoded in the encoding for that locale (Taiwan) which is "Big5". Though
> > > I don't know how safe an assumption that is.
> >
> > Not at all.  You have to know the encoding, and specify it.
> 
> Yes, I do. I specify "Big5" for example in the case above, since that is
> the encoding for Taiwan. If a template file for the zh_TW locale is
> always encoded in Big5 then I'm A-OK.

That's right.  As long as you either specify it at getTemplate() time,
or make it the default INPUT_ENCODING for Velocity.
 
> > Velocity
> > assumes that if you don't specify, the template byte stream (no matter
> > where it comes from) is encoded in ISO-8859-1
> 
> Yeah, I noticed that, but I don't see why you use that rather than the
> platform's default encoding? You can get it via
> System.getProperty("file.encoding")

History, at this point. :)  That can be changed.
 
> >
> > >
> > > I don't know whether anybody is using all this structure, since if you
> > > don't specify a locale, then it's default locale and default encoding
> > > all ways around, which surely tends to work for unilingual web sites.
> > > I've tried to go the extra mile in my framework to do the right things
> > > transparently, but I don't have feedback from people as to whether it
> > > works for Asian languages etcetera.
> >
> > Yes, people are using it, and quite effectively.  And it's not even
> > 'default locale'.  It's LATIN-1. :)
> 
> Yes, but that's the default encoding for anybody in the Western
> Hemisphere or Western Europe. So for the vast majority of your user
> base, the above distinction is quibbling.

Yep

> 
> > People who need and understand it
> > use it quite well.  The internationalization features was driving by
> > direct user request.
> >
> > > > t.merge( context, writer);
> > >
> > > This is the method I was using it turns out.
> >
> > Ah.  See, there is no encoding parameter - because you established the
> > encoding for the writer if you got it from the response object in a
> > servlet environment, and you appear to be simply lucky on the input side
> > :)
> 
> No, it wouldn't be just luck. I specify the encoding on the input side.
> The Niggle framework assumes that the encoding of a template named
> mytemplate_ru.html (You would fish that out as a result of looking for
> mytemplate.html from a context in the preferred locale was Russian,
> say.) And it assumes that the page template in question is encoded with
> the encoding used for that locale.
> 
> Which is ISO-8859-5
> 
> At least in the Freemarker/Webmacro bridge code, I was doing this. I had
> slipped up and was not doing so in the Velocity bridge code, but I only
> wrote that yesterday. :-)

Cool - Because you impliticly note the encoding in the template name, I
think that incorporating this aspect with Velocity will be very easy for
you.
 
> Now, I'm using the same scheme with velocity.
> 
> >
> > > >
> > > > so you see, fundamentally the two steps are distinct.  You could merge
> > > > the same template repeatedly with different writers that use different
> > > > encodings, if you wished.
> > >
> > > Yes, that's true, though I don't know if it's really a likely scenario.
> > >
> > > Thanks. When I wrote the note, actually it did slip my mind that
> > > Velocity might be used in non-servlet contexts, so that does explain the
> > > separate mergeTemplate helper code.
> >
> > Actually, that's not why it's there - most people still use the
> > getTemplate() -> template.merge() pattern in applications...
> > getTemplate() still requires you to declare the input encoding of the
> > template if it is not LATIN-1.
> 
> Actually, I missed that the mergeTemplate took the *name* of the
> template as well. I also just was wondering what the extent to which
> this had been investigated and tested was.

Extend to which *what* has been investigated and tested???  Is there a
bug?

geir

-- 
Geir Magnusson Jr.                           geirm@optonline.net
System and Software Consulting
Developing for the web?  See http://jakarta.apache.org/velocity/
You have a genius for suggesting things I've come a cropper with!

Mime
View raw message