xmlgraphics-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Hennebert <vincent.henneb...@anyware-tech.com>
Subject Re: Fw: [DISCUSS] PDFBox proposal
Date Thu, 15 Nov 2007 10:56:38 GMT

Jeremias Maerki wrote:
> Yesterday, we've discussed a possible incubation of PDFBox at the ASF.
> There are several projects that are interested in such a move. For us
> here in the XML Graphics project, PDFBox is interesting due to its
> parsing functionality. Our own PDF library doesn't have that
> functionality and is instead optimized for writing PDF which PDFBox
> isn't.
> As you may know, I've implemented a FOP plug-in that allows embedding of
> PDF in newly generated PDF documents through XSL-FO. Using the same PDF
> library for both tasks would be beneficial in the long-term.
> Please take a look at the incubation proposal (link below) we're
> currently writing. I have some questions to the XML Graphics community
> in this context:
> - Should the XML Graphics PMC be the sponsoring entity? [1]

A small reservation, only because to me PDFBox is “more than that”. See 

> - Can anyone besides me imagine investing time/resources to help with
> the incubation, teaching PDFBox additional tricks like we need them?

I’m afraid not. Not due to a lack of interest, but only time really. And 
I’d already like to help with the incubation of Jeuclid (even if 
I haven’t done anything in this area so far :-\).

> - Can we imagine PDFBox becoming a subproject of XML Graphics after
> successful incubation? PDF is not really an XML technology but deals
> with graphical output.

This aspect is not a problem for me. We already have PostScript-related 
stuff in Commons, which doesn’t have anything to do with XML either. On 
the long term we should probably emphasise the “Graphics” part of the 
project’s name.

> Newer technologies like XPS (Microsoft's XML
> paper specification) and Adobe's Mars are XML-based paged document
> formats. Not that they play a big role in the market, yet.
> [1] Makes sense if we have a strong interest in PDFBox. If it's just me,
> then it doesn't make sense and we're going to find a different solution.
> Please note: We have some functionality overlap between our PDF library
> and PDFBox in any case. Examples:
> - Writing PDF (org.apache.fop.pdf)
> - Parsing fonts (org.apache.fop.fonts, org.apache.batik.svggen.font.table)
> - Font conversion (org.apache.batik.svggen.font)
> - XMP metadata (org.apache.xmlgraphics.xmp)
> - Image loading (org.apache.fop.image, org.apache.batik.ext.awt.image.spi)
> BTW, the above table shows some spots where we could actually discuss
> better cooperation within XML Graphics, i.e. between Batik & FOP.
> Thoughts?

A few. I lack a bit of skills in that whole area, but in the hope they 
will be useful:
- my understanding is that our PDF library is quite specialized for 
  producing output from the area tree. In the end there is probably some 
  common stuff that can be factored out of the several renderers 
  (mainly: AFP, PostScript, PDF). I’m not sure PDFBox would integrate 
  smoothly in that scheme.
- to a certain extent there may be the same issue with fonts. Our needs 
  go slightly further than just parsing PostScript/TrueType/OpenType 
  fonts and embedding them in the output format. We also need to embed 
  them in PostScript, or convert them into AWT, possibly AFP, etc. 
  Ideally another sub-project dedicated to fonts, on which 
  FOP/Batik/PDFBox would rely, would probably be necessary.
- that said, we would probably benefit from a general-purpose PDF 
  library that would provide us with extra-functionalities like 
  encryption (and tagged PDF?). It might make sense to keep our output 
  library in a minimal form, and use PDFBox as a post-processor for 
  optimizing the output or adding encryption or whatever.
  You told about a PDF/A validator, but even a general PDF validator 
  would perhaps be useful.

I’m slightly doubtful it would make sense to have PDFBox as an XML 
Graphics subproject, because it has both too many and not enough 
features for our needs. Although it’s obvious that stuff can be shared 
between the projects, and that one would have its place as 
a sub-project. But PDFBox probably deserves to be a top-level one, all 
the more if other Apache projects would also have a use of it. For us 
that would be a dependency, like the other jars in the lib/ directory. 
That said, had I to vote, that would probably be a +0.9.

Hope that all makes sense,

Apache XML Graphics Project URL: http://xmlgraphics.apache.org/
To unsubscribe, e-mail: general-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: general-help@xmlgraphics.apache.org

View raw message