tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TIKA-2352) Incorrect EOF exception in WordPerfect parser
Date Tue, 02 May 2017 19:52:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993599#comment-15993599
] 

Tim Allison edited comment on TIKA-2352 at 5/2/17 7:51 PM:
-----------------------------------------------------------

{noformat}00002FF0              0A 0C D0 08 0A 00 00 C8 00 06 00 00      ..Ð....È....
00003000  0A 00 08 D0 D3 04 0A 00 01 00 01 00 00 00 0A 00  ...ÐÓ...........
00003010  04 D3 C3 0C C3 0A C1 E0 C1 10 EC 13 23 00 C1 20  .ÓÃ.Ã.ÁàÁ.ì.#.Á 
00003020  C3 02 C3                                         Ã.Ã
{noformat}

then {{1.  INTRODUCTION}}

It looks like {{C1 E0 C1}} is a complete {{C1}} skip, then {{EC}} is interpreted as the start
of a variable length multi-byte function of length {{23}}; but from the text which appears
in LibreOffice, {{EC}} should not be interpreted as the start of a variable length function.

I wonder [~pascal.essiembre]...if {{C1...C1...C1}} were a valid skip pattern, then {{EC}}
would be enclosed in the skipped content, and we could resume with {{C3 02 C3}} and then the
text.


was (Author: tallison@mitre.org):
{noformat}00002FF0              0A 0C D0 08 0A 00 00 C8 00 06 00 00      ..Ð....È....
00003000  0A 00 08 D0 D3 04 0A 00 01 00 01 00 00 00 0A 00  ...ÐÓ...........
00003010  04 D3 C3 0C C3 0A C1 E0 C1 10 EC 13 23 00 C1 20  .ÓÃ.Ã.ÁàÁ.ì.#.Á 
00003020  C3 02 C3                                         Ã.Ã
{noformat}

It looks like {{C1 E0 C1}} is a complete {{C1}} skip, then {{EC}} is interpreted as the start
of a variable length multi-byte function of length {{23}}, from the text, is not what it should
be...

I wonder [~pascal.essiembre]...if {{C1...C1...C1}} were a valid skip pattern, then {{EC}}
would be enclosed in the skipped content, and we could resume with {{C3 02 C3}} and then the
text.

> Incorrect EOF exception in WordPerfect parser
> ---------------------------------------------
>
>                 Key: TIKA-2352
>                 URL: https://issues.apache.org/jira/browse/TIKA-2352
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Tim Allison
>            Priority: Trivial
>         Attachments: 462321.wp
>
>
> We have a few EOF exceptions in WordPerfect files that are likely not truncated.  The
example I'll attach shortly is able to be opened without complaint by LibreOffice.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message