tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2045) TIKA crashes / runs out of memory on simple PDF
Date Mon, 08 Aug 2016 10:50:20 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411654#comment-15411654

Tim Allison commented on TIKA-2045:

PDFBOX-3442 was just fixed.  We can close this out on the next PDFBox release.

> TIKA crashes / runs out of memory on simple PDF
> -----------------------------------------------
>                 Key: TIKA-2045
>                 URL: https://issues.apache.org/jira/browse/TIKA-2045
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.13
>         Environment: Linux, Java 8
>            Reporter: Egbert
> We're using TIKA embedded in a webcrawler and today I've encountered a PDF that results
in OutOfMemory errors while being processed by TIKA.
> It's a small, 1 page PDF file, so I don't think that it should consume that much memory.
> I verified the problem by using the GUI from the tika-app-1.13.jar file and that results
in the same error on the same file. The file can be found at:
> http://www.spesmea.nl/pdf/algemene_voorwaarden_bbztcn_2010_nl.pdf
> If I can help by providing any additional information, please let me know.

This message was sent by Atlassian JIRA

View raw message