tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2701) Text is not extracted properly from WMF files
Date Thu, 02 Aug 2018 05:23:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566365#comment-16566365
] 

ASF GitHub Bot commented on TIKA-2701:
--------------------------------------

grigoriy opened a new pull request #245: fix for TIKA-2701 contributed by grigoriy
URL: https://github.com/apache/tika/pull/245
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Text is not extracted properly from WMF files
> ---------------------------------------------
>
>                 Key: TIKA-2701
>                 URL: https://issues.apache.org/jira/browse/TIKA-2701
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.15
>            Reporter: Grigoriy Alekseev
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: thumbnail_1.wmf
>
>
> Text is always extracted assuming it is in cp-1252 encoding. The attached thumbnail_1.wmf
has text in Shift JIS and is extracted incorrectly. Should be 普林斯.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message