xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dan.mccabe" <mccabe.danie...@gmail.com>
Subject Re: large image embedding problems
Date Wed, 16 Sep 2009 18:59:54 GMT

Hey Jeremias,

First and foremost, I want to thank you for all your help.  I was able to
follow along with your instructions and get an implementation that could
embed raw JP2 images into a PDF in a pretty short amount of time, which I
definitely couldn't have done without your help.  If you have an interest in
the source code for this, I would be more than happy to share it.  The one
caveat with it currently is that it relies on the JAI ImageIO project
(https://jai-imageio-core.dev.java.net/) to be able to parse the header for
the JP2 file in my PreloadJP2 class, so I'm not sure if it would be possible
or not to distribute it with the graphics commons library (it's under the
BSD license).  I took a look under the hood and it appears that it uses
JJ2000 (http://jj2000.epfl.ch/; http://jpeg2000.epfl.ch/) to do some of the
heavy lifting, so if licensing is a problem, there may be some alternatives
that can be explored.

However, after getting familiar with the FOP source code, I think I've found
the root cause of the problem we were experiencing, which has caused us to
go with a slightly different implementation.  I came across the issue after
I had implemented the JP2 support and I was still not seeing the
transparency we needed in the resulting PDF.  I compared the resulting PDF
with one we had generated using PNGs and noticed that all of the PNGs had a
soft-mask associated with them while none of the JP2s did.  After taking a
look at the implementation, I found that I needed to add code to my
ImageRawJP2Adapter to return a soft-mask reference when there was
transparency in the image.

This all worked fine, but it dawned on me that because the transparency was
controlled by the soft-mask and not by the type of image itself, there was
no reason we couldn't use JPG files as long as we found a way to specify an
accompanying mask for it.  Because of some issues with rendering JP2 files
(we were using im4java as an interface to ImageMagick to generate the
images, but there are some hoops you have to jump through to get ImageMagick
to run on some machines), it was definitely preferable to use JPGs if
possible.  The solution we settled on was inside of our custom image handler
for generating the images in the SVG, we took the BufferedImage that needed
to be saved and wrote out two JPG files for it, one for the image and one
for the mask.  In the setup method in ImageRawJPEGAdapater, I put in some
custom code to check for this accompanying mask file and add a soft-mask
using it if it was available.  This is definitely a bit of a hack, but it'll
work for now, so we should be good.

I spent some time looking into the relationship between RenderedImage,
ImageRendered, ImageRenderedAdapter, PDFDocument, and PDFImageXObject, and
it appears that the images should get garbage collected properly under
normal circumstances (thanks to the overloaded output( OutputStream ) method
in PDFImageXObject).  I also spent some time tracing through the code for
rendering SVGs to PDF, and it looked like the images got cleaned up
correctly there too.  However, what is clear is that whenever
ImageRenderedAdapter gets used with our application, OutOfMemoryErrors will
ensue.  The images shouldn't be too large to fit in memory altogether
though, so I'm not entirely sure what was causing the issue.  When it does
go down this path, the program usually gets through a couple pages before it
errors out, which originally made me think that it was maybe holding images
in memory for longer than they needed to be, but now I'm not so sure.

This may not be any news to you, but I figured as long as I had done some
research into figuring out what the problem was, I would share it in case it
was helpful.  Thanks again for all the help!


Jeremias Maerki-2 wrote:
> Hi Dan,
> I'm afraid I don't see any other possibility than to implement this
> properly. At least the good news is that with JPEG you've got a full
> example of how to embed that format uncompressed into a PDF. Here are
> some pointers on what needs to be done:
> XML Graphics Commons:
> http://xmlgraphics.apache.org/commons/image-loader.html
> [1]
> http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/loader/impl/ImageRawJPEG.java?view=markup
> [2]
> http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/loader/impl/ImageLoaderRawJPEG.java?view=markup
> [3]
> http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/loader/impl/PreloaderJPEG.java?view=markup
> [4]
> http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/image/loader/impl/ImageLoaderFactoryRaw.java?view=markup
> [5]
> http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/resources/META-INF/services/org.apache.xmlgraphics.image.loader.spi.ImagePreloader?view=markup
> [6]
> http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/resources/META-INF/services/org.apache.xmlgraphics.image.loader.spi.ImageLoaderFactory?view=markup
> FOP:
> [7]
> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/render/pdf/PDFImageHandlerRawJPEG.java?view=markup
> [8]
> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/render/pdf/ImageRawJPEGAdapter.java?view=markup
> [9]
> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/META-INF/services/org.apache.fop.render.ImageHandler?view=markup
> [10]
> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/pdf/
> [11]
> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/java/org/apache/fop/pdf/PDFDocument.java?view=markup
> First of all, you need a plug-in for the image loading framework in XML
> Graphics Commons. The "preloader" [3] is responsible for detecting the
> file format and extracting some basic information about the image
> (hopefully without loading the full image already). That way the layout
> engine doesn't have to load the full image into memory. Only the
> renderer needs access to the full image. In the case of JPEG, it doesn't
> even have to be loaded in memory. Hopefully, the same will be possible
> for JPEG 2000. The preloader needs to be registered in [5].
> The second step is providing an image class representing the undecoded
> JPEG 2000 image [1]. Then you need a loader that builds that
> representation [2] and a factory with metadata for the loader [4].
> Once you have that FOP will be able to provide a JPEG 2000 image in its
> raw format. At this point, you'll have to teach FOP how to make use of
> that. A PDF-specific image handler [7] (which is also a plug-in [9])
> needs to be built. Its presence will tell the image loading framework
> that it can provide JPEG 2000 images in raw format. Otherwise, it will
> simply check if ImageIO has a codec for JPEG 2000 (but this means the
> image gets decoded). The image handler then uses an image adapter [8] to
> finally embed the image into the PDF. I assume you will also need a few
> modifications in FOP's PDF library to support the JPXDecode filter [10].
> Since JPXDecode is a PDF 1.5 feature, you will also need to introduce a
> switch [11] between PDF 1.4 and 1.5. That is necessary because of PDF/A
> and PDF/X functionality which require keeping PDF on version 1.4. So
> JPEG 2000 should only be available when PDF 1.5 is enabled.
> I guess one of the first steps should also be studying the JPEG 2000
> specification and the PDF specification so you can decide whether the
> direct embedding of JPEG 2000 images is possible in the first place.
> Otherwise, you might spend a lot of time on something that may not work
> in the end. I don't know the JPEG 2000 format so I can't tell if it's
> possible without diving into this myself.
> HTH and good luck!
> On 12.09.2009 00:24:41 dan.mccabe wrote:
>> Hey Jeremias,
>> I'm working on this problem with Bill, and it looks like we may be
>> reaching
>> a point where we need to try to tackle embedding JPEG 2000 images. 
>> Assuming
>> we do need to go down that path, do you have any recommendations for
>> where
>> we should start?
>> However, this is assuming that we can't find another way to do what we
>> need
>> to do.  Based on your description, it certainly doesn't sound like an
>> easy
>> task to get this implemented, so we really only want to do this as a last
>> resort.  Based on the description of what we are trying to do, do you
>> have
>> any suggestions for an alternative approach that might help us reach our
>> goal?
>> Thanks.
>> Jeremias Maerki-2 wrote:
>> > 
>> > FOP currently produces PDF 1.4 so there's no support for JPEG 2000,
>> yet.
>> > One could (probably) add support for embedding undecoded JPEG 2000
>> > images (JPXDecode) to FOP and add an option with which to control the
>> > PDF version produced by FOP. Of course, that means digging into the
>> > source code of FOP and XML Graphics Commons. I can give you pointers if
>> > you decide to do that.
>> > 
>> > However, I haven't investigated if it's as simple as with JPEG to also
>> > embed JPEG 2000 images. I mention that since I've once tried to get
>> > undecoded PNG graphics directly into PDF. After all, the FlateDecode
>> > filter supports about the same predictors as PNG but I couldn't make
>> > this work in reasonable time. This just as a caveat.
>> > 
>> > On 11.09.2009 04:58:09 Bill Gamble wrote:
>> >> Hello Everyone,
>> >> We are generating PDFs which are very graphic intensive. A typical PDF
>> >> has
>> >> 50 pages and has 4 4000x4000 images on a page, and the images can
>> >> have transparency.
>> >> 
>> >> We are using Batik for generating each page as an SVG file, and then
>> >> referencing the SVG using the <fox:external-document when converting
>> to
>> >> PDF.
>> >> 
>> >> We run into performance problems when the images embedded in the SVG
>> file
>> >> are anything but JPEGs. JPEGs are lighting fast and have a resulting
>> pdf
>> >> file size 10X smaller than any other format. Unfortunately the
>> embedded
>> >> images can have transparency, so standard JPEG format cannot be used,
>> and
>> >> all other file formats run into memory problems and generate enormous
>> pdf
>> >> files (300MB+).
>> >> 
>> >> After finding that PDF has had support for JPXDecode (for JPEG 2000)
>> >> since
>> >> 1.5 I was hoping to find that JPEG 2000 could injected into the PDF
>> >> without
>> >> the need to decode the image, but that does not appear to be the case
>> (we
>> >> run into the same performance problems with JPEG 2000).
>> >> 
>> >> Can anyone comment on:
>> >> 
>> >> 1) Is this a limitation of the PDF format, or how FOP is rendering the
>> >> PDF?
>> >> 2) Any suggestions or other approaches that to how to solve our
>> problem?
>> >> 
>> >> Thanks in advance!
>> > 
>> > 
>> > 
>> > 
>> > Jeremias Maerki
>> > 
>> > 
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
>> > For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org
>> > 
>> > 
>> > 
>> -- 
>> View this message in context:
>> http://www.nabble.com/large-image-embedding-problems-tp25394304p25409319.html
>> Sent from the FOP - Users mailing list archive at Nabble.com.
> Jeremias Maerki
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
> For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

View this message in context: http://www.nabble.com/large-image-embedding-problems-tp25394304p25478390.html
Sent from the FOP - Users mailing list archive at Nabble.com.

To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

View raw message