xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Kaufman <marck...@adobe.com>
Subject RE: isolated high surrogate
Date Thu, 14 Jul 2016 21:09:47 GMT
Worthwhile for someone to do, probably. Outside of my current needs. I’m not interested in
being a FOP developer.

From: Glenn Adams [mailto:glenn@skynav.com]
Sent: Thursday, July 14, 2016 1:48 PM
To: FOP Users <fop-users@xmlgraphics.apache.org>
Subject: Re: isolated high surrogate

I'd suggest you test FOP by using an XSL-FO input file directly rather than an XSL template.
Template processing is not part of FOP functionality in the first place.

On Thu, Jul 14, 2016 at 2:37 PM, Marc Kaufman <marckauf@adobe.com<mailto:marckauf@adobe.com>>
wrote:
More specifically, if I replace “ “ with u/200B (zero width space) in the string that
contains surrogate characters, FOP parsing fails even if I just use xsl:value-of. I’m not
going to pursue that at this time. Maybe when FOP handles non-BMP characters it should be
revisited.

Marc

From: Marc Kaufman [mailto:marckauf@adobe.com<mailto:marckauf@adobe.com>]
Sent: Thursday, July 14, 2016 12:34 PM

To: fop-users@xmlgraphics.apache.org<mailto:fop-users@xmlgraphics.apache.org>
Subject: RE: isolated high surrogate

I’ve isolated the problem to a template definition that is trying to replace apace characters
with non-breaking spaces. Evidently it clobbers some surrogate pairs. FWIW: here’s the offending
line(s):

  <xsl:template name="zero_width_space_1">
    <xsl:param name="data"/>
    <xsl:param name="counter" select="0"/>
    <xsl:choose>
      <xsl:when test="$counter &lt; string-length($data)+1">
        <xsl:value-of select='concat(substring($data,$counter,1),"&#8203;")'/>
        <xsl:call-template name="zero_width_space_2">
          <xsl:with-param name="data" select="$data"/>
          <xsl:with-param name="counter" select="$counter+1"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template name="zero_width_space_2">
    <xsl:param name="data"/>
    <xsl:param name="counter"/>
    <xsl:value-of select='concat(substring($data,$counter,1),"&#8203;")'/>
    <xsl:call-template name="zero_width_space_1">
      <xsl:with-param name="data" select="$data"/>
      <xsl:with-param name="counter" select="$counter+1"/>
    </xsl:call-template>
  </xsl:template>

So, not an FOP problem.

Marc

From: Marc Kaufman [mailto:marckauf@adobe.com]
Sent: Thursday, July 14, 2016 12:22 PM
To: fop-users@xmlgraphics.apache.org<mailto:fop-users@xmlgraphics.apache.org>
Subject: RE: isolated high surrogate

I tried that. Doesn’t work. I understand that non-BMP is not supported, and I’m prepared
to live with two .notdef characters in the result, but I’m not sure why I’m getting the
fatal error from the parser.

From: Glenn Adams [mailto:glenn@skynav.com]
Sent: Thursday, July 14, 2016 12:01 PM
To: FOP Users <fop-users@xmlgraphics.apache.org<mailto:fop-users@xmlgraphics.apache.org>>
Subject: Re: isolated high surrogate

Non-BMP characters are not presently supported by FOP, see [1]. When they are supported, you
would best encode them in a file using a single (not two) numeric character entities, e.g.,
&#x010001;, etc.

[1] https://issues.apache.org/jira/browse/FOP-1969

On Thu, Jul 14, 2016 at 12:51 PM, Marc Kaufman <marckauf@adobe.com<mailto:marckauf@adobe.com>>
wrote:
I’m stumped by this error:
org.xml.sax.SAXParseException; lineNumber: 92; columnNumber: 51; java.lang.IllegalArgumentException:
isolated high surrogate

I have text with surrogate pairs throughout the file, but this only occurs in this context:
    <fo:block padding-top="2em" padding-bottom=".5em" text-align="left" font-family="Kozuka
Gothic PR6N" font-size="18pt" color="black">
      <xsl:call-template name="zero_width_space_1">
        <xsl:with-param name="data" select="@documentName"/>
      </xsl:call-template>
    </fo:block>

I’ve checked the input stream, and all the surrogates are correctly paired. I’ve tried
escaping the surrogate pairs (e.g. “&#-integer-;”), but that doesn’t change the
error.



Mime
View raw message