tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Beryozkin <sberyoz...@gmail.com>
Subject Re: Quarkus integration
Date Thu, 15 Aug 2019 14:15:15 GMT
Hi,
The initial documentation is here:
https://quarkus.io/guides/tika-guide

Lots more to come over time, and we have already had users trying it (not
many but hope to see more feedback from them soon)
Sergey

On Fri, May 10, 2019 at 6:04 PM Sergey Beryozkin <sberyozkin@gmail.com>
wrote:

> I've managed to get the PDFParser running in the native mode, but I had to
> delay the initialization of
> org.apache.pdfbox.pdmodel.font.PDType1Font, this class has static
> PDType1Font instances, one of them leading to
> org.apache.fontbox.ttf.RAFDataStream which opens a file handler thus Graal
> can not convert it to the native code during the build time, so one needs
> to delay the initialization of PDType1Font till the run time.
>
> If we start from the PDF parser the the call path to RAFDataStream starts
> from:
>
>
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.verifyOrCreateDefaults(PDAcroForm.java:106)
>      at
> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.<init>(PDAcroForm.java:93)
>      at
> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:108)
>      org.apache.tika.parser.pdf.PDFParser.handleXFAOnly(PDFParser.java:534)
>
> I guess I may need to create a PR for PDFBox where RAFDataStream opens a
> stream lazily, with a check like ensureOpen() being added to its read
> methods...
>
> Sergey
>
> On Fri, May 3, 2019 at 1:22 PM Sergey Beryozkin <sberyozkin@gmail.com>
> wrote:
>
>> Yes, please add 'sergeyb', I've just assigned myself a CXF issue as
>> 'sergeyb'. Sorry about these multiple ids, but indeed I'll try to keep
>> using a single one.
>>
>> Thanks, Sergey
>>
>>
>>
>> On Fri, May 3, 2019 at 12:13 PM Tim Allison <tallison@apache.org> wrote:
>>
>>> I can add 'sergeyb' if you'd prefer!
>>>
>>> On Fri, May 3, 2019 at 5:43 AM Sergey Beryozkin <sberyozkin@gmail.com>
>>> wrote:
>>> >
>>> > Though I might need to settle on the 'sergeyb' eventually since it is
>>> my
>>> > apache committer id.
>>> > Thanks...
>>> >
>>> > On Fri, May 3, 2019 at 10:29 AM Sergey Beryozkin <sberyozkin@gmail.com
>>> >
>>> > wrote:
>>> >
>>> > > Oh, I forgot I had a 'sergey_beryozkin' id as well, this is not good,
>>> > > shows how long ago I did contribute :-) (did try sergey.beryozkin
>>> though).
>>> > >
>>> > > Thanks for checking it, I've just assigned this issue to myself.
>>> > > Cheers, Sergey
>>> > >
>>> > >
>>> > > On Thu, May 2, 2019 at 6:08 PM Sergey Beryozkin <
>>> sberyozkin@gmail.com>
>>> > > wrote:
>>> > >
>>> > >> Hi Tim
>>> > >>
>>> > >> I can't assign
>>> > >> https://issues.apache.org/jira/browse/TIKA-2862
>>> > >>
>>> > >> to myself, I used to be able to assign, I know I had some time
away
>>> from
>>> > >> Tika, but I'm keen to return with few contributions :-)
>>> > >> Please update my record for me to be able to assign the issues
again
>>> > >>
>>> > >> Cheers, Sergey
>>> > >>
>>> > >> On Tue, Apr 30, 2019 at 6:22 PM Sergey Beryozkin <
>>> sberyozkin@gmail.com>
>>> > >> wrote:
>>> > >>
>>> > >>> Hi Tim, All
>>> > >>>
>>> > >>> I've started working on integrating Tika with Quarkus [1].
The
>>> main idea
>>> > >>> is to be able to use Tika in the native image mode.
>>> > >>> It's quite likely I'll start creating the PRs soon, to get
the
>>> native
>>> > >>> image related issues resolved, these are related to some libraries
>>> > >>> statically initializing FileDescriptors, etc.
>>> > >>>
>>> > >>> Thanks, Sergey
>>> > >>>
>>> > >>> [1]
>>> > >>>
>>> https://github.com/sberyozkin/quarkus/tree/tika_extension/extensions/tika
>>> > >>> [2]
>>> > >>>
>>> https://github.com/sberyozkin/quarkus-quickstarts/tree/tika/getting-started-tika
>>> > >>>
>>> > >>>
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message