tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aakarsh Medleri Hire Math (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1532) DIF Parser
Date Sun, 22 Feb 2015 15:10:11 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332210#comment-14332210
] 

Aakarsh Medleri Hire Math commented on TIKA-1532:
-------------------------------------------------

Hi Nick,

Sorry for the delayed response.
It seems like there is no unique mime type associated with GCMD .dif files. We have crawled
around 8000 files from ACADIS website (https://www.aoncadis.org) and all these files had their
content type set to text/plain. However, the data itself is represented in XML format. Does
that mean TIKA should detect it as application/xml or text/xml?

Here is one such example: https://www.aoncadis.org/dataset/Zamora2010.dif

You can find rest of the crawled links at:
https://raw.githubusercontent.com/shekarprashant/TikaDirectedResearch/master/Acadis%20Complete%20Crawl%20Raw%20Results.csv

Looking forward for your inputs.

Thanks,
Aakarsh

> DIF Parser
> ----------
>
>                 Key: TIKA-1532
>                 URL: https://issues.apache.org/jira/browse/TIKA-1532
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Aakarsh Medleri Hire Math
>
> MIME Type detection & content parser for .dif format



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message