tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prashanth Ramaswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-245) Support of CHM Format
Date Mon, 03 Feb 2014 08:40:10 GMT

    [ https://issues.apache.org/jira/browse/TIKA-245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889317#comment-13889317

Prashanth Ramaswamy commented on TIKA-245:

Nick, Thanks for your response.  Unfortunately, I am constrained from uploading the chm file
for which I'm encountering the exception.  I may have to see if there are other chm files
for which the same exception gets thrown.

> Support of CHM Format
> ---------------------
>                 Key: TIKA-245
>                 URL: https://issues.apache.org/jira/browse/TIKA-245
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>         Environment: All
>            Reporter: Karl Heinz Marbaise
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 0.10
>         Attachments: TIKA-245.oleg.20110806.PATCH, TIKA-245.tikhonov.04082011.patch.txt,
TIKA-245.tikhonov.20103107.patch.txt, TIKA-245.tikhonov.20112603.txt, TIKA-245.tikhonov.20112703.txt
> It might be a good idea to support the CHM File format of Windows. Some information about
http://en.wikipedia.org/wiki/Microsoft_Compiled_HTML_Help#Extracting_to_HTML. The CHM format
contains HTML files which can be parsed by Tika. So the "only" problem is to extract the data
from the CHM file.

This message was sent by Atlassian JIRA

View raw message