tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Beryozkin <sberyoz...@gmail.com>
Subject Re: Quarkus integration
Date Thu, 15 Aug 2019 14:21:29 GMT
If someone from the large Tika team can give that extension a try, whenever
time allows, it would be super, it will help me improve that extension. If
you do decide to try, please post the feedback to
https://groups.google.com/forum/#!forum/quarkus-dev
or if it fails miserably for your documents, may be here first :-)
Cheers, Sergey

On Thu, Aug 15, 2019 at 3:15 PM Sergey Beryozkin <sberyozkin@gmail.com>
wrote:

> Hi,
> The initial documentation is here:
> https://quarkus.io/guides/tika-guide
>
> Lots more to come over time, and we have already had users trying it (not
> many but hope to see more feedback from them soon)
> Sergey
>
> On Fri, May 10, 2019 at 6:04 PM Sergey Beryozkin <sberyozkin@gmail.com>
> wrote:
>
>> I've managed to get the PDFParser running in the native mode, but I had
>> to delay the initialization of
>> org.apache.pdfbox.pdmodel.font.PDType1Font, this class has static
>> PDType1Font instances, one of them leading to
>> org.apache.fontbox.ttf.RAFDataStream which opens a file handler thus Graal
>> can not convert it to the native code during the build time, so one needs
>> to delay the initialization of PDType1Font till the run time.
>>
>> If we start from the PDF parser the the call path to RAFDataStream starts
>> from:
>>
>>
>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.verifyOrCreateDefaults(PDAcroForm.java:106)
>>      at
>> org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.<init>(PDAcroForm.java:93)
>>      at
>> org.apache.pdfbox.pdmodel.PDDocumentCatalog.getAcroForm(PDDocumentCatalog.java:108)
>>
>> org.apache.tika.parser.pdf.PDFParser.handleXFAOnly(PDFParser.java:534)
>>
>> I guess I may need to create a PR for PDFBox where RAFDataStream opens a
>> stream lazily, with a check like ensureOpen() being added to its read
>> methods...
>>
>> Sergey
>>
>> On Fri, May 3, 2019 at 1:22 PM Sergey Beryozkin <sberyozkin@gmail.com>
>> wrote:
>>
>>> Yes, please add 'sergeyb', I've just assigned myself a CXF issue as
>>> 'sergeyb'. Sorry about these multiple ids, but indeed I'll try to keep
>>> using a single one.
>>>
>>> Thanks, Sergey
>>>
>>>
>>>
>>> On Fri, May 3, 2019 at 12:13 PM Tim Allison <tallison@apache.org> wrote:
>>>
>>>> I can add 'sergeyb' if you'd prefer!
>>>>
>>>> On Fri, May 3, 2019 at 5:43 AM Sergey Beryozkin <sberyozkin@gmail.com>
>>>> wrote:
>>>> >
>>>> > Though I might need to settle on the 'sergeyb' eventually since it is
>>>> my
>>>> > apache committer id.
>>>> > Thanks...
>>>> >
>>>> > On Fri, May 3, 2019 at 10:29 AM Sergey Beryozkin <
>>>> sberyozkin@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > Oh, I forgot I had a 'sergey_beryozkin' id as well, this is not
>>>> good,
>>>> > > shows how long ago I did contribute :-) (did try sergey.beryozkin
>>>> though).
>>>> > >
>>>> > > Thanks for checking it, I've just assigned this issue to myself.
>>>> > > Cheers, Sergey
>>>> > >
>>>> > >
>>>> > > On Thu, May 2, 2019 at 6:08 PM Sergey Beryozkin <
>>>> sberyozkin@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > >> Hi Tim
>>>> > >>
>>>> > >> I can't assign
>>>> > >> https://issues.apache.org/jira/browse/TIKA-2862
>>>> > >>
>>>> > >> to myself, I used to be able to assign, I know I had some time
>>>> away from
>>>> > >> Tika, but I'm keen to return with few contributions :-)
>>>> > >> Please update my record for me to be able to assign the issues
>>>> again
>>>> > >>
>>>> > >> Cheers, Sergey
>>>> > >>
>>>> > >> On Tue, Apr 30, 2019 at 6:22 PM Sergey Beryozkin <
>>>> sberyozkin@gmail.com>
>>>> > >> wrote:
>>>> > >>
>>>> > >>> Hi Tim, All
>>>> > >>>
>>>> > >>> I've started working on integrating Tika with Quarkus [1].
The
>>>> main idea
>>>> > >>> is to be able to use Tika in the native image mode.
>>>> > >>> It's quite likely I'll start creating the PRs soon, to
get the
>>>> native
>>>> > >>> image related issues resolved, these are related to some
libraries
>>>> > >>> statically initializing FileDescriptors, etc.
>>>> > >>>
>>>> > >>> Thanks, Sergey
>>>> > >>>
>>>> > >>> [1]
>>>> > >>>
>>>> https://github.com/sberyozkin/quarkus/tree/tika_extension/extensions/tika
>>>> > >>> [2]
>>>> > >>>
>>>> https://github.com/sberyozkin/quarkus-quickstarts/tree/tika/getting-started-tika
>>>> > >>>
>>>> > >>>
>>>>
>>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message