tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Beryozkin <sberyoz...@gmail.com>
Subject Re: HTML to PDF conversion
Date Wed, 16 Oct 2019 12:05:44 GMT
Ken, thanks for the feedback, I meant to reply to your comments,

I suppose I really meant Tika offering a uniform API to create some simple
structured PDF/etc files.
ContentCreator creator = ContentCreator.get("PDF");
creator.addTitle("Introduction to Tika");
creator.addText("");
creator.addTable("tablename", new LinkedHashMap<String, List<String>>());
creator.addAttachment(someImage);
creator.complete();

It would be consistent with the Tika approach on the read side.

Cheers, Sergey
On Mon, Oct 14, 2019 at 4:13 PM Ken Krugler <kkrugler@apache.org> wrote:

> If you’re suggesting ways to make it easier to use something like
> YaHPConverter with Tika, definitely yes.
>
> If you’re talking about integrating this functionality…my personal view is
> no.
>
> I think Tika should focus on extracting content from documents, versus
> format transformations.
>
> Tika is an attractive location for functionality like this, since it sits
> in the middle of a lot of data processing pipelines, but I worry about a
> bloated code base, with corresponding challenges in maintenance and support.
>
> Regards,
>
> — Ken
>
>
> > On Oct 14, 2019, at 4:38 AM, Sergey Beryozkin <sberyozkin@gmail.com>
> wrote:
> >
> > Hi All
> >
> > I've seen a Quarkus user asking how to convert to PDF, and one of my
> > colleagues pointed to
> >
> http://www.allcolor.org/YaHPConverter/doc/org/allcolor/yahp/converter/IHtmlToPdfTransformer.html
> >
> > Does it make sense for Tika to offer something related to the text to PDF
> > (for a start, something on top of that transformer), and then may be even
> > for other formats ?
> >
> > Sergey
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message