xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georg Datterl <georg.datt...@geneon.de>
Subject AW: Long URLs appear exempt from word wrap, overflow i-p-d into next column
Date Thu, 19 Jan 2012 08:16:20 GMT
Hi Craig,

I'd think the main reason is, the hyphenation rules are based on words. And while ThisIsAnExtremelyLongWordFullOfPlacesWeCanHyphenate
may be a word, www.health4all.greatshapetoday.com.au most likely is not. If you have a limited
set of URLs and a limited set of languages, you could extend the hyphenation rules. Or you
could add zwsp before and after dots. But even greatshapetoday is no known English word and
therefore most likely not covered by hyphenation rules.

Regards,

Georg Datterl

------ Kontakt ------

Georg Datterl

Geneon media solutions gmbh
Gutenstetter Straße 8a
90449 Nürnberg

HRB Nürnberg: 17193
Geschäftsführer: Yong-Harry Steiert

Tel.: 0911/36 78 88 - 26
Fax: 0911/36 78 88 - 20

www.geneon.de

Weitere Mitglieder der Willmy MediaGroup:

IRS Integrated Realization Services GmbH:    www.irs-nbg.de
Willmy PrintMedia GmbH:                      www.willmy.de
Willmy Consult & Content GmbH:               www.willmycc.de

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Donnerstag, 19. Januar 2012 03:27
An: fop-users@xmlgraphics.apache.org
Betreff: Long URLs appear exempt from word wrap, overflow i-p-d into next column

Hi

I'm in the later stages of classified ad pagination system that uses Apache FOP for its layout
backend.

I'm running into issues where long URLs aren't getting broken across lines when they're too
long to fit on a line. They're just spilling out of the i-p-d of the block they're contained
by and overlaying text in the next column.

I'm using fop 1.0 and for testing am using only the built-in fonts.
Output is direct to PDF from fop, though the same issue can be reproduced with PostScript
output. Links to test case files are at the bottom of this email.

This does not happen with other very long words, eg ThisIsAnExtremelyLongWordFullOfPlacesWeCanHyphenate,
which is hyphenated after "Word" in my test and prints just fine.

In the area tree the problem URLs look normal and aren't annotated with any sort of special
markup, but they aren't broken between lines even when they wouldn't fit. Fop would know this
by the time it output the area tree because it has the required font and device metrics.

I'm using fop 1.0. The issue is not specific to any particular font and affects the default
sans-serif font as well as the font I'm using, Myriad Pro SemiCondensed . The samples linked
to below have been generated using only fop's built-in fonts to make sure it's easy for others
to run them, so please forgive the crappy appearance.

As you can see in the sample PDF, the URLs
  www.health4all.greatshapetoday.com.au
(p1, 3rd col, 2nd below "health and beauty" heading) and
  www.perthrentalapartments.com.au
(p2, 1st col, 5th below "to let" heading) are overflowing their available i-p-d.

The area tree output for these URLs is the same as any other non-problem
line:

<lineArea><text><word offset="0">the.long.url.</word></text></lineArea>

I haven't declared anything special about these URLs and would expect them to be either broken
or "squished" into the line, preferably the latter (though I don't think fop supports horizontal
scaling of text yet).

I'm wondering if fop is detecting that these are URLs and applying some special formatting
rules, since they seem to be hyperlinked in Acrobat.
Is it that, or are URLs something the hyphenation/breaks/layout algorithm just doesn't cope
well with?

Has anyone else run into similar issues here? If so, found any workarounds or solutions?

For now I'm probably going to look for long URLs in the input text and add zero-width spaces
(or some similar nonprinting char) to them at promising looking points, so fop has something
to break on. I'd love a better solution to what must be a relatively common problem, though.


In case it matters, the problem URLs are in a <fo:block> in cells of a 1-col table flowed
into columns. The text of interest is located in a path like this:

fo:root/fo:page-sequence/fo:flow/fo:block[id=columnInnerBlock]
  /fo:table/fo:table-body/fo:table-row/fo:table-cell/fo:block

The outer fo:block[id=columnInnerBlock] is used to generate some borders around the columns,
it only contains tables and never any raw text content.


XSL-FO source, please ignore missing image from header as it doesn't affect the layout:

http://www.postnewspapers.com.au/~craig/webfiles/testcases/fop-break-url/fop-break-url.xml

The problem areas are line 387 and line 1003, the URLs noted above.

PDF generated with "fop test.xsl test.pdf". I usually use an embedded fop instance that generates
an area tree which I post-process and feed back into fop, but for the purposes of this test
case I've used a standalone fop.

http://www.postnewspapers.com.au/~craig/webfiles/testcases/fop-break-url/fop-break-url.pdf

--
Craig Ringer

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Mime
View raw message