tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mungeol heo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file
Date Fri, 04 Sep 2015 01:37:45 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14730149#comment-14730149
] 

mungeol heo commented on TIKA-1728:
-----------------------------------

For detecting, it is working.

> java -jar tika-app-1.10.jar -d test_5.0.hwp
> application/x-hwp-v5

However, it gives error while using other options of tika app.
For instance,

> java -jar tika-app-1.10.jar -m test_5.0.hwp
> Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@4c2a511
> ...and the rest of the error...

I have captured and attached the full error message which is error-message.png.

> Detection is not working properly for detecting HWP 5.0 file
> ------------------------------------------------------------
>
>                 Key: TIKA-1728
>                 URL: https://issues.apache.org/jira/browse/TIKA-1728
>             Project: Tika
>          Issue Type: Bug
>         Environment: OS: windows 7 and centos 6
> Java: 1.7
> Tika jar: tika-app-1.10.jar
> File: HWP 5.0
>            Reporter: mungeol heo
>         Attachments: HWP-document-file-formats-3.0-Korean.pdf, HWP-document-file-formats-5.0-Korean.pdf,
error-message.png, test_3.0.hwp, test_5.0.hwp
>
>
> HWP file has two formats which are HWP 3.0 and HWP 5.0.
> 'tika-app-1.10.jar' detects HWP 3.0 format's file correctly.
> But, not for HWP 5.0.
> Used commands and returned results are addresses below.
> > java -jar tika-app-1.10.jar --detect test_3.0.hwp
> > application/x-hwp
> > java -jar tika-app-1.10.jar --detect test_5.0.hwp
> > application/x-tika-msoffice



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message