tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Gullion (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2036) Deleted Text from Word File Shows Up in Extract
Date Fri, 15 Jul 2016 23:49:20 GMT
Steve Gullion created TIKA-2036:
-----------------------------------

             Summary: Deleted Text from Word File Shows Up in Extract
                 Key: TIKA-2036
                 URL: https://issues.apache.org/jira/browse/TIKA-2036
             Project: Tika
          Issue Type: Bug
          Components: core
    Affects Versions: 1.13
         Environment: Windows, under TikaOnDotNet
            Reporter: Steve Gullion


A .docx file, with "track changes" on, includes deleted text. In this case, there are two
overlapping deletions:

9.	[DELETED:This Agreement shall be governed by and construed in accordance with [INSERTED,
THEN DELETED:Arizona] New York law] (Intentionally omitted.)

The text should only include "9. (Intentionally omitted)". However, the output is "9. This
Agreement shall be governed and construed in accordance with New York law." So it recognizes
"Arizona" as deleted, but not the rest of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message