xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Delmelle <andreas.delme...@telenet.be>
Subject Re: FOP hyphenate-ladder-count and pre-hyphenated text
Date Thu, 07 May 2015 22:40:30 GMT
<snip />
> Internally, what FOP tries to hyphenate are only the word fragments in between the spaces
and SHYs, but those cannot be broken up further themselves. As strange as it sounds, a SHY
is treated as a word boundary here, similar to a zero-width space.

Correction/Expansion: 
ZWSP is actually a bad analogy, and quite differently treated by the Unicode algorithm, since
the character *is* the break opportunity (=break *on* or *before*).
With SHY, the break opportunity lies *after* the character.

Still, internally for FOP, the result is roughly the same. The accumulated sequence of characters
since the previous break opportunity is taken to be a 'word', which may or may not end in
a hyphen. If the latter is true, a specific sequence of elements is glued to the word-box,
to prevent a break before SHY and make sure that it is properly rendered, i.e. only counts
if the break occurs right after.

As hyphenation by FOP itself is applied at a higher level, when all layout elements for a
whole paragraph have been collected, that SHY sequence is seen as a word boundary. That is,
that part of the algorithm just accumulates the text for ‘uninterrupted' sequences of word-boxes,
and feeds those pieces to the hyphenator. The real intention is to apply hyphenation across
any nested fo:inlines. ‘Uninterrupted’ means that auxiliary elements, generated for border
or padding are explicitly *not* considered as word boundaries. The sequence generated for
SHY contains two non-auxiliary elements, as if it were a space. Perhaps, just to ensure that
that position in the layout always leads to a character that is visibly rendered.

In case of pre-hyphenated text, this has the unintended effect of restricting the input for
the hyphenator to parts of words, which is basically meaningless (and wasteful).

BTW, there is an entry related to rendering of SHY logged in JIRA already, but more specifically
about copy-paste functionality: https://issues.apache.org/jira/browse/FOP-2358

Seems like some refactoring may be in order here, to streamline and better merge the two approaches.

@Marc: Will you log a request for enhancement in JIRA for this, or shall I?


KR

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Mime
View raw message