tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2701) Text is not extracted properly from WMF files
Date Thu, 02 Aug 2018 05:23:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566365#comment-16566365

ASF GitHub Bot commented on TIKA-2701:

grigoriy opened a new pull request #245: fix for TIKA-2701 contributed by grigoriy
URL: https://github.com/apache/tika/pull/245

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Text is not extracted properly from WMF files
> ---------------------------------------------
>                 Key: TIKA-2701
>                 URL: https://issues.apache.org/jira/browse/TIKA-2701
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15
>            Reporter: Grigoriy Alekseev
>            Priority: Major
>             Fix For: 2.0.0
>         Attachments: thumbnail_1.wmf
> Text is always extracted assuming it is in cp-1252 encoding. The attached thumbnail_1.wmf
has text in Shift JIS and is extracted incorrectly. Should be 普林斯.

This message was sent by Atlassian JIRA

View raw message