tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file
Date Thu, 03 Sep 2015 11:12:45 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728863#comment-14728863
] 

Nick Burch commented on TIKA-1728:
----------------------------------

The issues is that the v3 files (and earlier?) are in their own wrapper, while the v5 (and
later?) ones are stored within an OLE2 structure

As of r1700986, the v3 files continue to be detected as {{application/x-hwp}}, while the v5
ones are now detected as {{application/x-hwp-v5}}

It'd be helpful if someone could confirm what the very latest file type is, so we can decide
if that v5 on the mimetype is a suitable name, or if we should make it more general

> Detection is not working properly for detecting HWP 5.0 file
> ------------------------------------------------------------
>
>                 Key: TIKA-1728
>                 URL: https://issues.apache.org/jira/browse/TIKA-1728
>             Project: Tika
>          Issue Type: Bug
>         Environment: OS: windows 7 and centos 6
> Java: 1.7
> Tika jar: tika-app-1.10.jar
> File: HWP 5.0
>            Reporter: mungeol heo
>         Attachments: HWP-document-file-formats-3.0-Korean.pdf, HWP-document-file-formats-5.0-Korean.pdf,
test_3.0.hwp, test_5.0.hwp
>
>
> HWP file has two formats which are HWP 3.0 and HWP 5.0.
> 'tika-app-1.10.jar' detects HWP 3.0 format's file correctly.
> But, not for HWP 5.0.
> Used commands and returned results are addresses below.
> > java -jar tika-app-1.10.jar --detect test_3.0.hwp
> > application/x-hwp
> > java -jar tika-app-1.10.jar --detect test_5.0.hwp
> > application/x-tika-msoffice



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message