tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomas Safarik (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-1194) Missing text from MS Word (DOC) file
Date Tue, 12 Nov 2013 10:13:17 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tomas Safarik updated TIKA-1194:
--------------------------------

    Description: 
Hello,

we noticed that filtered text from some MS Word DOC files is missing one line (in table cell)
in the original document.

- If you add or remove one character anywhere before the problematic line/cell then the filtered
text is correct. If you get the text back to original the filtering problem is back.
- If the file is resaved as DOCX filtering works fine.

I will provide sample document. And please let me know if more information is needed.

Regards,

Tomas

  was:
Hello,

we noticed that filtered text from some MS Word DOC files is missing one line (in table) in
the original document.

- If you add or remove one character anywhere before the problematic line filtered text is
correct. If you get the text bac to original the filtering problem is back.
- If the file is resaved as DOCX filtering works fine.

I will provide sample document. And please let me know if more information is needed.

Regards,

Tomas


> Missing text from MS Word (DOC) file
> ------------------------------------
>
>                 Key: TIKA-1194
>                 URL: https://issues.apache.org/jira/browse/TIKA-1194
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.4
>            Reporter: Tomas Safarik
>            Priority: Critical
>         Attachments: OP-06-015.doc
>
>
> Hello,
> we noticed that filtered text from some MS Word DOC files is missing one line (in table
cell) in the original document.
> - If you add or remove one character anywhere before the problematic line/cell then the
filtered text is correct. If you get the text back to original the filtering problem is back.
> - If the file is resaved as DOCX filtering works fine.
> I will provide sample document. And please let me know if more information is needed.
> Regards,
> Tomas



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message