tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Beryozkin <sberyoz...@gmail.com>
Subject Re: HTML to PDF conversion
Date Wed, 16 Oct 2019 15:59:30 GMT
It was not what I was suggesting. My only proposal was about having a
simple API (without an attempt to cover all the various format specific
options at the API level) which would let Tika users quickly create format
specific content without having to deal with the format specific libraries,
exactly consistent what it does on the read side.
I appreciate it can require some effort and by no means I'm pushing for it

Sergey

On Wed, Oct 16, 2019 at 4:50 PM Ken Krugler <kkrugler@apache.org> wrote:

> I can see the attraction of one API to convert XHTML to various formats.
>
> Though very quickly that simple API would become complex, as each target
> format has its own conversion options.
>
> And if successful, we’d pull in even more 3rd party jars to handle that
> conversion.
>
> Wonder if there’s a need for a new project called “Akit”, which focuses on
> XHTML -> various formats :)
>
> — Ken
>
> > On Oct 16, 2019, at 5:05 AM, Sergey Beryozkin <sberyozkin@gmail.com>
> wrote:
> >
> > Ken, thanks for the feedback, I meant to reply to your comments,
> >
> > I suppose I really meant Tika offering a uniform API to create some
> simple
> > structured PDF/etc files.
> > ContentCreator creator = ContentCreator.get("PDF");
> > creator.addTitle("Introduction to Tika");
> > creator.addText("");
> > creator.addTable("tablename", new LinkedHashMap<String, List<String>>());
> > creator.addAttachment(someImage);
> > creator.complete();
> >
> > It would be consistent with the Tika approach on the read side.
> >
> > Cheers, Sergey
> > On Mon, Oct 14, 2019 at 4:13 PM Ken Krugler <kkrugler@apache.org> wrote:
> >
> >> If you’re suggesting ways to make it easier to use something like
> >> YaHPConverter with Tika, definitely yes.
> >>
> >> If you’re talking about integrating this functionality…my personal view
> is
> >> no.
> >>
> >> I think Tika should focus on extracting content from documents, versus
> >> format transformations.
> >>
> >> Tika is an attractive location for functionality like this, since it
> sits
> >> in the middle of a lot of data processing pipelines, but I worry about a
> >> bloated code base, with corresponding challenges in maintenance and
> support.
> >>
> >> Regards,
> >>
> >> — Ken
> >>
> >>
> >>> On Oct 14, 2019, at 4:38 AM, Sergey Beryozkin <sberyozkin@gmail.com>
> >> wrote:
> >>>
> >>> Hi All
> >>>
> >>> I've seen a Quarkus user asking how to convert to PDF, and one of my
> >>> colleagues pointed to
> >>>
> >>
> http://www.allcolor.org/YaHPConverter/doc/org/allcolor/yahp/converter/IHtmlToPdfTransformer.html
> >>>
> >>> Does it make sense for Tika to offer something related to the text to
> PDF
> >>> (for a start, something on top of that transformer), and then may be
> even
> >>> for other formats ?
> >>>
> >>> Sergey
> >>
> >> --------------------------
> >> Ken Krugler
> >> http://www.scaleunlimited.com
> >> custom big data solutions & training
> >> Hadoop, Cascading, Cassandra & Solr
> >>
> >>
>
> --------------------------
> Ken Krugler
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message