xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glenn Adams <gl...@skynav.com>
Subject Re: FOP 1.1 - Japanese 4-byte characters are rendering as '?' in pdf
Date Wed, 20 May 2015 13:21:21 GMT
Firstly, I suggest you avoid using the term "4-byte Japanese characters",
since that has no meaning except in the context of some encoding, like
UTF-8, UTF-16, UTF-32, etc. In Java, all String objects are encoded in
UTF-16 as 16-bit code units. So BMP characters use one 16-bit code unit,
and non-BMP characters use two 16-bit code units, i.e., an upper and lower
surrogate.

Hiragana characters are encoded in the BMP in the range U+3040 to U+309F
[1], Katakana in U+30A0 to U+30FF [2], and CJK Ideographs starting at
U+4E00.

[1] http://www.unicode.org/charts/PDF/U3040.pdf
[2] http://www.unicode.org/charts/PDF/U30A0.pdf
[3] http://www.unicode.org/charts/PDF/U4E00.pdf

For external FO files, or XML files you will translate to FO via XSLT, you
should use the UTF-8 encoding of Unicode, and ensure that you provide a
correct XML declaration at the beginning of your file:

<?xml version="1.0" encoding="utf-8"?>

I also suggest that you ensure the presence of the UTF-8 encoding of the
BOM [4] at the beginning of the file: 0xEF 0xBB 0xBF.

[4] http://en.wikipedia.org/wiki/Byte_order_mark

On Wed, May 20, 2015 at 3:20 AM, mrunal28 <loharms@gmail.com> wrote:

> Hi Glenn,
>
> I am trying to understand if Katakana japanese language is a BMP unicode as
> per below link:
> http://www.sttmedia.com/unicode-basiclingualplane
> <http://www.sttmedia.com/unicode-basiclingualplane>
>
> If I assume that Katakana is a 4-byte japanese language. As per your reply
> if Katakana is BMP encoded and FOP supports it, then using FOP 1.1, my code
> should render 4-byte japanese characters correctly in pdf.
>
> I am attaching my code which I am using to convert japanese text into pdf.
> Please find attached files. fop_allfonts.xconf
> <http://apache-fop.1065347.n5.nabble.com/file/n42155/fop_allfonts.xconf>
> ExampleXML2PDF.java
> <http://apache-fop.1065347.n5.nabble.com/file/n42155/ExampleXML2PDF.java>
>
> Please if you have suggestion on shared files.
>
> So Questions are:
> 1. Is Kanataka is BMP encoded? I assume it is.
> 2. Am I missing something in code to convert japanese 4-byte into pdf?
>
>
>
> --
> View this message in context:
> http://apache-fop.1065347.n5.nabble.com/FOP-1-1-Japanese-4-byte-characters-are-rendering-as-in-pdf-tp42117p42155.html
> Sent from the FOP - Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>
>

Mime
View raw message