tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-2352) Incorrect EOF exception in WordPerfect parser
Date Thu, 04 May 2017 00:37:04 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tim Allison updated TIKA-2352:
------------------------------
    Attachment: reports.zip

Y, that fixed several problems, with no new exceptions.  I'm attaching the relevant reports.
 It looks like there may be some rare(ish) EOF in wordperfect 5.1, and there may be some areas
for improvement in {{application/x-quattro-pro; version=9}}.

We should ignore EOF on files from common crawl that are near 1MB, which typically means they
were truncated and legitimately hit EOF (e.g. the one exception for {{application/vnd.wordperfect;
version=6.x}}).

Thank you, again!

> Incorrect EOF exception in WordPerfect parser
> ---------------------------------------------
>
>                 Key: TIKA-2352
>                 URL: https://issues.apache.org/jira/browse/TIKA-2352
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Priority: Trivial
>             Fix For: 2.0, 1.15
>
>         Attachments: 462321.wp, reports.zip
>
>
> We have a few EOF exceptions in WordPerfect files that are likely not truncated.  The
example I'll attach shortly is able to be opened without complaint by LibreOffice.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message