tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Fisher <w...@apache.org>
Subject Re: HTML to PDF conversion
Date Wed, 16 Oct 2019 14:50:41 GMT
Hi -

You may want to take a look at Apache FOP which is part of the Apache XML Graphics project.
My team had success with that in generating PDF from XML.

Regards,
Dave

> On Oct 16, 2019, at 5:05 AM, Sergey Beryozkin <sberyozkin@gmail.com> wrote:
> 
> Ken, thanks for the feedback, I meant to reply to your comments,
> 
> I suppose I really meant Tika offering a uniform API to create some simple
> structured PDF/etc files.
> ContentCreator creator = ContentCreator.get("PDF");
> creator.addTitle("Introduction to Tika");
> creator.addText("");
> creator.addTable("tablename", new LinkedHashMap<String, List<String>>());
> creator.addAttachment(someImage);
> creator.complete();
> 
> It would be consistent with the Tika approach on the read side.
> 
> Cheers, Sergey
> On Mon, Oct 14, 2019 at 4:13 PM Ken Krugler <kkrugler@apache.org> wrote:
> 
>> If you’re suggesting ways to make it easier to use something like
>> YaHPConverter with Tika, definitely yes.
>> 
>> If you’re talking about integrating this functionality…my personal view is
>> no.
>> 
>> I think Tika should focus on extracting content from documents, versus
>> format transformations.
>> 
>> Tika is an attractive location for functionality like this, since it sits
>> in the middle of a lot of data processing pipelines, but I worry about a
>> bloated code base, with corresponding challenges in maintenance and support.
>> 
>> Regards,
>> 
>> — Ken
>> 
>> 
>>> On Oct 14, 2019, at 4:38 AM, Sergey Beryozkin <sberyozkin@gmail.com>
>> wrote:
>>> 
>>> Hi All
>>> 
>>> I've seen a Quarkus user asking how to convert to PDF, and one of my
>>> colleagues pointed to
>>> 
>> http://www.allcolor.org/YaHPConverter/doc/org/allcolor/yahp/converter/IHtmlToPdfTransformer.html
>>> 
>>> Does it make sense for Tika to offer something related to the text to PDF
>>> (for a start, something on top of that transformer), and then may be even
>>> for other formats ?
>>> 
>>> Sergey
>> 
>> --------------------------
>> Ken Krugler
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>> 
>> 


Mime
View raw message