xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Ringer <cr...@postnewspapers.com.au>
Subject Long URLs appear exempt from word wrap, overflow i-p-d into next column
Date Thu, 19 Jan 2012 02:27:14 GMT
Hi

I'm in the later stages of classified ad pagination system that uses
Apache FOP for its layout backend.

I'm running into issues where long URLs aren't getting broken across
lines when they're too long to fit on a line. They're just spilling out
of the i-p-d of the block they're contained by and overlaying text in
the next column.

I'm using fop 1.0 and for testing am using only the built-in fonts.
Output is direct to PDF from fop, though the same issue can be
reproduced with PostScript output. Links to test case files are at the
bottom of this email.

This does not happen with other very long words, eg
ThisIsAnExtremelyLongWordFullOfPlacesWeCanHyphenate, which is hyphenated
after "Word" in my test and prints just fine.

In the area tree the problem URLs look normal and aren't annotated with
any sort of special markup, but they aren't broken between lines even
when they wouldn't fit. Fop would know this by the time it output the
area tree because it has the required font and device metrics.

I'm using fop 1.0. The issue is not specific to any particular font and
affects the default sans-serif font as well as the font I'm using,
Myriad Pro SemiCondensed . The samples linked to below have been
generated using only fop's built-in fonts to make sure it's easy for
others to run them, so please forgive the crappy appearance.

As you can see in the sample PDF, the URLs
  www.health4all.greatshapetoday.com.au
(p1, 3rd col, 2nd below "health and beauty" heading) and
  www.perthrentalapartments.com.au
(p2, 1st col, 5th below "to let" heading)
are overflowing their available i-p-d.

The area tree output for these URLs is the same as any other non-problem
line:

<lineArea><text><word offset="0">the.long.url.</word></text></lineArea>

I haven't declared anything special about these URLs and would expect
them to be either broken or "squished" into the line, preferably the
latter (though I don't think fop supports horizontal scaling of text yet).

I'm wondering if fop is detecting that these are URLs and applying some
special formatting rules, since they seem to be hyperlinked in Acrobat.
Is it that, or are URLs something the hyphenation/breaks/layout
algorithm just doesn't cope well with?

Has anyone else run into similar issues here? If so, found any
workarounds or solutions?

For now I'm probably going to look for long URLs in the input text and
add zero-width spaces (or some similar nonprinting char) to them at
promising looking points, so fop has something to break on. I'd love a
better solution to what must be a relatively common problem, though.


In case it matters, the problem URLs are in a <fo:block> in cells of a
1-col table flowed into columns. The text of interest is located in a
path like this:

fo:root/fo:page-sequence/fo:flow/fo:block[id=columnInnerBlock]
  /fo:table/fo:table-body/fo:table-row/fo:table-cell/fo:block

The outer fo:block[id=columnInnerBlock] is used to generate some borders
around the columns, it only contains tables and never any raw text content.


XSL-FO source, please ignore missing image from header as it doesn't
affect the layout:

http://www.postnewspapers.com.au/~craig/webfiles/testcases/fop-break-url/fop-break-url.xml

The problem areas are line 387 and line 1003, the URLs noted above.

PDF generated with "fop test.xsl test.pdf". I usually use an embedded
fop instance that generates an area tree which I post-process and feed
back into fop, but for the purposes of this test case I've used a
standalone fop.

http://www.postnewspapers.com.au/~craig/webfiles/testcases/fop-break-url/fop-break-url.pdf

--
Craig Ringer

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Mime
View raw message