From Craig McDaniel <cpmcdan...@gmail.com>
Subject Re: Non-breaking space turning into "??"
Date Thu, 01 Dec 2005 03:42:11 GMT
I've been able to debug this a little bit, and it seems that, even
though I am setting the output encoding to UTF-8, it is being written
as ASCII. Since we can't get much farther without posting code, here

  Serializer serializer = SerializerFactory.getSerializer(props);
  log.debug("Output Encoding: " +
  serializer.setOutputStream(new FileOutputStream(results));
  filters[lastFilter].parse(new InputSource(new FileReader(xmlFile)));
  log.debug("Finished the transformation");

The first log message indeed prints "Output Encoding: UTF-8". However,
when I create a FileReader for this same File ("results" in the code
above), and do file.getEncoding(), it prints "ASCII". Also, when I
look at the file with less, I see "General<C2><A0>Electric" and in
emacs, I see "General??Electric". This is just an XSL transform up to
this point, nothing FOP-specific (though the file is a FO document),
so perhaps the Xalan list is the proper place for this question?

Here is the code for the Reader:

  FileReader fileReader = new FileReader(foFile);
  BufferedReader reader = new BufferedReader(fileReader);
  log.debug("Encoding for " + foFile + ": " + fileReader.getEncoding());

Again, this prints "Encoding for /tmp/quarterly40215.xml: ASCII". At
this point, the reader is used to read the file into a byte array.
Then it is wrapped in a ByteArrayInput stream and fed to the FOP
Driver. Are we any closer?

On 11/25/05, Craig McDaniel <cpmcdaniel@gmail.com> wrote:
> On 11/25/05, Andreas L Delmelle <a_l.delmelle@pandora.be> wrote:
> > On Nov 25, 2005, at 22:14, Craig McDaniel wrote:
> >
> > > I am trying to debug a PDF rendering for a client where non-breaking
> > > spaces are comming out as double question marks "??". FOP is being
> > > called from a servlet. I have tried using the fop command line tool
> > > and can not reproduce the problem. I have written an simple servlet on
> > > another system that functionally does the same thing, and can not
> > > reproduce the problem here either.
> > >
> > > Any ideas what could cause this? Is it some kind of character encoding
> > > issue?
> >
> > Indeed. The question-marks are most likely related to:
> > http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/
> > CharsetEncoder.html
> >
> > > The entity &#x00a0; is being used. What should my next step be
> > > in debugging this?
> >
> > Firstly: are you still using FOP 0.20.5? If so, can you try out the
> > recent alpha release, and report if the problem still occurs?
> I am using 0.20.5. Unfortunately, I do not have access to deploy
> changes to the server at this time, so I am unable to test changes in
> the only environment where the problem is happening ;-(
> > If you can't (or are already using FOP 0.90alpha), I think the best
> > bet is to go looking for places --in the servlet code, I presume--
> > where an XML declaration is hard-coded as a String literal or where a
> > Charset is used that's different from the default (= UTF-8).
> The original data file has no XML declaration. The stylesheet has one,
> but does not have an encoding attribute. The &#x00a0; entities are in
> the XSL, by the way.
> I almost feel like I am debugging this thing blind. I do have the
> source code, but it is too spread out to post here. It might be worth
> pointing out that the XSL is applied to the XML data and sent to a
> ByteArrayOutputStream. The byte array is then stored and later passed
> into the FOP driver as a ByteArrayInputStream. Likewise, the output of
> the driver is written to a byte array and finally, it gets sent to the
> browser with response.getOutputStream().write(bytes). Not the way I
> would have done it. Anyway, like I said, I coded up a servlet just
> like this one and could not reproduce the problem in my own
> environment. Perhaps this is a default encoding problem.
> > HTH!
> Absolutely, thanks for your help!
> > Cheers,
> >
> > Andreas
> --
> Craig McDaniel

Craig McDaniel

