tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Beryozkin <sberyoz...@gmail.com>
Subject Re: HTML to PDF conversion
Date Wed, 16 Oct 2019 15:57:20 GMT
Hi Dave

Thanks, I was suggesting a more neutral approach

Cheers, Sergey

On Wed, Oct 16, 2019 at 3:50 PM Dave Fisher <wave@apache.org> wrote:

> Hi -
>
> You may want to take a look at Apache FOP which is part of the Apache XML
> Graphics project. My team had success with that in generating PDF from XML.
>
> Regards,
> Dave
>
> > On Oct 16, 2019, at 5:05 AM, Sergey Beryozkin <sberyozkin@gmail.com>
> wrote:
> >
> > Ken, thanks for the feedback, I meant to reply to your comments,
> >
> > I suppose I really meant Tika offering a uniform API to create some
> simple
> > structured PDF/etc files.
> > ContentCreator creator = ContentCreator.get("PDF");
> > creator.addTitle("Introduction to Tika");
> > creator.addText("");
> > creator.addTable("tablename", new LinkedHashMap<String, List<String>>());
> > creator.addAttachment(someImage);
> > creator.complete();
> >
> > It would be consistent with the Tika approach on the read side.
> >
> > Cheers, Sergey
> > On Mon, Oct 14, 2019 at 4:13 PM Ken Krugler <kkrugler@apache.org> wrote:
> >
> >> If you’re suggesting ways to make it easier to use something like
> >> YaHPConverter with Tika, definitely yes.
> >>
> >> If you’re talking about integrating this functionality…my personal view
> is
> >> no.
> >>
> >> I think Tika should focus on extracting content from documents, versus
> >> format transformations.
> >>
> >> Tika is an attractive location for functionality like this, since it
> sits
> >> in the middle of a lot of data processing pipelines, but I worry about a
> >> bloated code base, with corresponding challenges in maintenance and
> support.
> >>
> >> Regards,
> >>
> >> — Ken
> >>
> >>
> >>> On Oct 14, 2019, at 4:38 AM, Sergey Beryozkin <sberyozkin@gmail.com>
> >> wrote:
> >>>
> >>> Hi All
> >>>
> >>> I've seen a Quarkus user asking how to convert to PDF, and one of my
> >>> colleagues pointed to
> >>>
> >>
> http://www.allcolor.org/YaHPConverter/doc/org/allcolor/yahp/converter/IHtmlToPdfTransformer.html
> >>>
> >>> Does it make sense for Tika to offer something related to the text to
> PDF
> >>> (for a start, something on top of that transformer), and then may be
> even
> >>> for other formats ?
> >>>
> >>> Sergey
> >>
> >> --------------------------
> >> Ken Krugler
> >> http://www.scaleunlimited.com
> >> custom big data solutions & training
> >> Hadoop, Cascading, Cassandra & Solr
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message