xmlgraphics-fop-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-François El Fouly <jean-franc...@elfouly.fr>
Subject Re: Newbie question
Date Fri, 04 Sep 2009 07:44:55 GMT

Le 4 sept. 09 à 03:34, Dola Woolfe a écrit :

> I'm trying to put together several elements to build a PDF translator.
> 1. Load a PDF in a foreign language (???)
> 2. Translate the content (Google Translate)
> 3. Output the translated PDF (FOP)
> So I'm guessing step 1 is not part of FOP. Can you perhaps recommend  
> what I can use for 1.?
> Thanks again!

I think you should try iText. You will find an explanation of what you  
need near the end of "iText in Action", the authoritative book by  
Bruno Lowagie, the guy who designed iText in the first place. And  
before proceeding in your project you *should* read the caveats in his  
book: extracting text content from an existing PDF may not be as  
straightforward as you think - in fact may be almost nonsense in  
certain situations. A PDF API will get you the text content in the  
order it was technically generated, which may not be the "textual"  
order (the order you read the elements in a book).
My own experience in top of this is that it is very difficult to  
extract text content from non-European or large fonts (the CID-keyed  
fonts, roughly said, those who have more than WinAnsi or ISO-8859-1  


To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org

View raw message