tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrug...@apache.org>
Subject Re: HTML to PDF conversion
Date Mon, 14 Oct 2019 15:12:48 GMT
If you’re suggesting ways to make it easier to use something like YaHPConverter with Tika,
definitely yes.

If you’re talking about integrating this functionality…my personal view is no.

I think Tika should focus on extracting content from documents, versus format transformations.

Tika is an attractive location for functionality like this, since it sits in the middle of
a lot of data processing pipelines, but I worry about a bloated code base, with corresponding
challenges in maintenance and support.


— Ken

> On Oct 14, 2019, at 4:38 AM, Sergey Beryozkin <sberyozkin@gmail.com> wrote:
> Hi All
> I've seen a Quarkus user asking how to convert to PDF, and one of my
> colleagues pointed to
> http://www.allcolor.org/YaHPConverter/doc/org/allcolor/yahp/converter/IHtmlToPdfTransformer.html
> Does it make sense for Tika to offer something related to the text to PDF
> (for a start, something on top of that transformer), and then may be even
> for other formats ?
> Sergey

Ken Krugler
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message