xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Delmelle <andreas.delme...@telenet.be>
Subject Re: Undesirable line breaks
Date Mon, 15 Sep 2008 16:43:43 GMT
On Sep 15, 2008, at 09:05, Ryan Lortie wrote:

> On Sun, 2008-09-14 at 01:49 +0200, Andreas Delmelle wrote:
>> At what point? I assume it's right before the '+', correct?
> Correct.
>
>> If the layout engine uses Unicode TR#14 as reference to determine the
>> line-breaks, then a break between 'k' and '+' would be allowed. '+'
>> belongs to the class of Numeric Prefix characters (PR), and as such
>> allows a break before but not a break after. (see: http:// 
>> www.unicode/
>> reports/tr14/#DescriptionOfProperties)
>
> I was not aware of this standard.  I find that to be a rather odd  
> choice
> to make (in the meantime I've thought of other common cases like "A+"
> and "C++", etc.).  Oh well :)

Indeed, but those are actually not so common. That is: it is more  
common for a '+' to appear in the context of a numerical/mathematical  
expression than following regular alphabetic characters. If the '+'  
appears as an operator in a long mathematical addition which is  
broken, one would most commonly prefer to see it as the first  
character on the next line, I believe...

In the uncommon cases, as I hinted, the most straightforward  
workaround (currently) is to have a word-joiner (U+2060) or a zero- 
width-no-break-space (U+FEFF) precede the '+' to steer the pair-based  
algorithm in the right direction.

On another note, the Unicode Technical Report does offer room for  
exceptions/customizations (as described in: http://www.unicode.org/ 
reports/tr14/#Customization), but FOP currently 'only' implements the  
basic algorithm. This 'only' points to a limitation, but apart from  
some quirky exceptions, this basic implementation does already cover  
a very great deal of line-breaking rules taken for granted in a lot  
of different contexts/languages. More notable exceptions are special  
line-breaking rules for Japanese and a variant of Korean. OTOH, the  
rules for languages like Chinese, Hebrew and Arabic are covered by  
TR#14. (that is: only the line-breaking. FOP still has severe issues  
with the actual typesetting of Arabic, for example. Although the line- 
breaks will be determined correctly, FOP does not do any glyph- 
merging for inner-word ligatures... Each codepoint remains a separate  
character in the output.)

>> Another alternative would be something like: <fo:wrapper keep-
>> together.within-line="always">Gtk+</fo:wrapper>
>
> With all of these workarounds it's getting to the point where nearly
> every part of the output from my stylesheets is littered with millions
> if <inline> elements :)

Probably better in your case to insert auxiliary codepoints then.

> To be more specific about what I was wondering about: is there any way
> to tell FOP in a general sense "please be less intelligent, and only
> break on ASCII space characters."?

In a way, you could override the behavior for 'AL followed by PR',  
such that this will also lead to an indirect break (i.e. only break  
if there is a space between the letter and the prefix-character)
BUT... for the moment, since the matter of customization of the  
Unicode algorithm has not been addressed completely, it means you'll  
end up with a customized FOP-build.

It is rather easy, but definitely not recommended:
1° download the source distribution (or check out the trunk via SVN)
2° modify the file 'src/codegen/unicode/data/LineBreakPairTable.txt'.  
The characters representing the different types of break-opportunity  
are available at http://www.unicode.org/reports/tr14/#ExampleTable,  
or in the source file 'src/codegen/unicode/java/org/apache/fop/text/ 
linebreak/GenerateLineBreakUtils.java' In short: the character in the  
grid at row AL/ column PR would have to be '%' instead of '_'.
3° after that, run 'ant codegen-unicode'
4° run the standard 'ant package'

Ideally, we should be looking for an approach where the user has the  
option of adding an overriding pair-table (for all or some of the  
combinations of classes), such that it would no longer be necessary  
to regenerate the class in question.

The downside currently is that there may be side-effects for some  
other cases, where the basic pair-table offered by Unicode does  
generate the expected break-opportunity...



Cheers

Andreas
---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Mime
View raw message