xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bowditch <bowditch_ch...@hotmail.com>
Subject Re: pdf-image: Handling size blow-out caused by fonts, any way to coalesce/merge multiple subsets of same font?
Date Tue, 13 Dec 2011 17:07:29 GMT
On 09/12/2011 06:57, Craig Ringer wrote:
> Hi all

Hi Craig,

> With pdf-image, is there any way to coalesce or merge multiple 
> different subsets of the same font into a single font subset with no 
> duplicate glyphs? Eg 50 different "Helvetica (subset)" instances into 
> a single font in the output document?
> Background:
> I've just got Jeremias's pdf-image extension integrated into my code. 
> It worked perfectly and immediately with little effort, which was 
> delightful. Thankyou *VERY* much Jeremias for publishing that, it's a 
> fantastic tool and I'd love to see it in fop core.
> I'm encountering an unexpected issue with it, though: the PDFs 
> produced by fop are *huge*. Examination with Acrobat Pro suggests that 
> 90% of the space is taken up by fonts. Looking at the font list, I see 
> huge numbers of copies of "Helvetica (subset)", "Helvetica Black 
> (subset)" etc. That makes sense, since all the input PDFs have fonts 
> embedded, and many use the same fonts. However, I'm including up to 
> 1000 PDFs in each output PDF so the size adds up to prohibitive levels.

We also have the same problem and have been trying to find a solution; 
There is a cache within the PDF plug-in, but as soon as you change the 
way it works, memory usage seems to balloon. We did manage to 
de-duplicate the fonts though. We're still investigating the memory 
issue. If we find a solution we will let you know.

> I'm wondering if there's any way to tell the pdf-image extension to 
> embed certain fonts fully from supplied font files and avoid copying 
> the matching subsets over from the input PDFs. If there isn't anything 
> like that, any idea how practical it'd be?
> For that matter, is the idea of collecting up all the subsets of a 
> font as each pdf-image is embedded, then merging them into a single 
> new embedded subset at the end completely insane? Or is it potentially 
> practical? For that matter just keeping track of which glyphs are 
> defined in each subset and building a new subset from a master font 
> file at the end that included all those glyphs would help a lot.
> I'm *really* hoping to avoid having to keep on using EPS input and 
> PostScript output to PDF via Distiller, so I'm willing to put some 
> work into this.

An alternative that we are planning on using if the memory issues with 
the plug-in can't be solved is to generate the PDFs from FOP separate to 
the static PDFs that you are importing and then use PDFBox in a post 
process to join the PDFs together at the end. Not ideal but it works.



> -- 
> Craig Ringer
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

View raw message