tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrug...@apache.org>
Subject Re: HTML to PDF conversion
Date Wed, 16 Oct 2019 15:50:12 GMT
I can see the attraction of one API to convert XHTML to various formats.

Though very quickly that simple API would become complex, as each target format has its own
conversion options.

And if successful, we’d pull in even more 3rd party jars to handle that conversion.

Wonder if there’s a need for a new project called “Akit”, which focuses on XHTML ->
various formats :)

— Ken

> On Oct 16, 2019, at 5:05 AM, Sergey Beryozkin <sberyozkin@gmail.com> wrote:
> 
> Ken, thanks for the feedback, I meant to reply to your comments,
> 
> I suppose I really meant Tika offering a uniform API to create some simple
> structured PDF/etc files.
> ContentCreator creator = ContentCreator.get("PDF");
> creator.addTitle("Introduction to Tika");
> creator.addText("");
> creator.addTable("tablename", new LinkedHashMap<String, List<String>>());
> creator.addAttachment(someImage);
> creator.complete();
> 
> It would be consistent with the Tika approach on the read side.
> 
> Cheers, Sergey
> On Mon, Oct 14, 2019 at 4:13 PM Ken Krugler <kkrugler@apache.org> wrote:
> 
>> If you’re suggesting ways to make it easier to use something like
>> YaHPConverter with Tika, definitely yes.
>> 
>> If you’re talking about integrating this functionality…my personal view is
>> no.
>> 
>> I think Tika should focus on extracting content from documents, versus
>> format transformations.
>> 
>> Tika is an attractive location for functionality like this, since it sits
>> in the middle of a lot of data processing pipelines, but I worry about a
>> bloated code base, with corresponding challenges in maintenance and support.
>> 
>> Regards,
>> 
>> — Ken
>> 
>> 
>>> On Oct 14, 2019, at 4:38 AM, Sergey Beryozkin <sberyozkin@gmail.com>
>> wrote:
>>> 
>>> Hi All
>>> 
>>> I've seen a Quarkus user asking how to convert to PDF, and one of my
>>> colleagues pointed to
>>> 
>> http://www.allcolor.org/YaHPConverter/doc/org/allcolor/yahp/converter/IHtmlToPdfTransformer.html
>>> 
>>> Does it make sense for Tika to offer something related to the text to PDF
>>> (for a start, something on top of that transformer), and then may be even
>>> for other formats ?
>>> 
>>> Sergey
>> 
>> --------------------------
>> Ken Krugler
>> http://www.scaleunlimited.com
>> custom big data solutions & training
>> Hadoop, Cascading, Cassandra & Solr
>> 
>> 

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message