tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Kingsbury (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2170) Tika 1.13 ForkParser fails intermittently with very large MS Word docx
Date Mon, 07 Nov 2016 15:46:58 GMT
Tim Kingsbury created TIKA-2170:
-----------------------------------

             Summary: Tika 1.13 ForkParser fails intermittently with very large MS Word docx
                 Key: TIKA-2170
                 URL: https://issues.apache.org/jira/browse/TIKA-2170
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.13
         Environment: Windows 10
            Reporter: Tim Kingsbury


If the ForkParser is run in a for-loop over and over against a single large Microsoft Word
DOCX file, it fails intermittently. Sometimes it will fail on the very first iteration. Sometimes
it will run through several iterations before failing. Results are inconsistent. 

A small test application is enclosed. For the test, I use a Word docx with the full text of
"War and Peace". 2.8MB, 1141 pages of text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message