xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig McDaniel <cpmcdan...@gmail.com>
Subject Re: Non-breaking space turning into "??"
Date Thu, 01 Dec 2005 04:43:10 GMT
OK, I was able to get one of the question marks to go away (leaving a
single question mark where the space should be). Here is what I
changed:

serializer.setOutputStream(new PrintStream(new
FileOutputStream(results), false, "UTF-8"));

and to read the file....

InputStreamReader fileReader = new InputStreamReader(new
FileInputStream(html), "UTF-8");
BufferedReader reader = new BufferedReader(fileReader);
log.debug("Encoding for " + html + ": " + fileReader.getEncoding());

....this prints "UTF8" as the encoding (without the dash). What's up
with that? Anyway, I think we are getting closer.

On 11/30/05, Craig McDaniel <cpmcdaniel@gmail.com> wrote:
> I've been able to debug this a little bit, and it seems that, even
> though I am setting the output encoding to UTF-8, it is being written
> as ASCII. Since we can't get much farther without posting code, here
> goes:
>
>   Serializer serializer = SerializerFactory.getSerializer(props);
>   log.debug("Output Encoding: " +
> serializer.getOutputFormat().getProperty("encoding"));
>   serializer.setOutputStream(new FileOutputStream(results));
>   filters[lastFilter].setContentHandler(serializer.asContentHandler());
>   filters[lastFilter].parse(new InputSource(new FileReader(xmlFile)));
>   log.debug("Finished the transformation");
>
> The first log message indeed prints "Output Encoding: UTF-8". However,
> when I create a FileReader for this same File ("results" in the code
> above), and do file.getEncoding(), it prints "ASCII". Also, when I
> look at the file with less, I see "General<C2><A0>Electric" and in
> emacs, I see "General??Electric". This is just an XSL transform up to
> this point, nothing FOP-specific (though the file is a FO document),
> so perhaps the Xalan list is the proper place for this question?
>
> Here is the code for the Reader:
>
>   FileReader fileReader = new FileReader(foFile);
>   BufferedReader reader = new BufferedReader(fileReader);
>   log.debug("Encoding for " + foFile + ": " + fileReader.getEncoding());
>
> Again, this prints "Encoding for /tmp/quarterly40215.xml: ASCII". At
> this point, the reader is used to read the file into a byte array.
> Then it is wrapped in a ByteArrayInput stream and fed to the FOP
> Driver. Are we any closer?
>
>
> On 11/25/05, Craig McDaniel <cpmcdaniel@gmail.com> wrote:
> > On 11/25/05, Andreas L Delmelle <a_l.delmelle@pandora.be> wrote:
> > > On Nov 25, 2005, at 22:14, Craig McDaniel wrote:
> > >
> > > > I am trying to debug a PDF rendering for a client where non-breaking
> > > > spaces are comming out as double question marks "??". FOP is being
> > > > called from a servlet. I have tried using the fop command line tool
> > > > and can not reproduce the problem. I have written an simple servlet on
> > > > another system that functionally does the same thing, and can not
> > > > reproduce the problem here either.
> > > >
> > > > Any ideas what could cause this? Is it some kind of character encoding
> > > > issue?
> > >
> > > Indeed. The question-marks are most likely related to:
> > > http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/
> > > CharsetEncoder.html
> > >
> > > > The entity &#x00a0; is being used. What should my next step be
> > > > in debugging this?
> > >
> > > Firstly: are you still using FOP 0.20.5? If so, can you try out the
> > > recent alpha release, and report if the problem still occurs?
> >
> > I am using 0.20.5. Unfortunately, I do not have access to deploy
> > changes to the server at this time, so I am unable to test changes in
> > the only environment where the problem is happening ;-(
> >
> > > If you can't (or are already using FOP 0.90alpha), I think the best
> > > bet is to go looking for places --in the servlet code, I presume--
> > > where an XML declaration is hard-coded as a String literal or where a
> > > Charset is used that's different from the default (= UTF-8).
> >
> > The original data file has no XML declaration. The stylesheet has one,
> > but does not have an encoding attribute. The &#x00a0; entities are in
> > the XSL, by the way.
> >
> > I almost feel like I am debugging this thing blind. I do have the
> > source code, but it is too spread out to post here. It might be worth
> > pointing out that the XSL is applied to the XML data and sent to a
> > ByteArrayOutputStream. The byte array is then stored and later passed
> > into the FOP driver as a ByteArrayInputStream. Likewise, the output of
> > the driver is written to a byte array and finally, it gets sent to the
> > browser with response.getOutputStream().write(bytes). Not the way I
> > would have done it. Anyway, like I said, I coded up a servlet just
> > like this one and could not reproduce the problem in my own
> > environment. Perhaps this is a default encoding problem.
> >
> > > HTH!
> >
> > Absolutely, thanks for your help!
> >
> > > Cheers,
> > >
> > > Andreas
> >
> >
> > --
> > Craig McDaniel
> >
>
>
> --
> Craig McDaniel
>


--
Craig McDaniel

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Mime
View raw message