tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1958) Add mime detection and lightweight parsers for Office 2003 Word and Excel formats
Date Tue, 26 Apr 2016 16:50:12 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258429#comment-15258429
] 

Nick Burch commented on TIKA-1958:
----------------------------------

On the detection, can't remember, probably best just try + unit test!

For the mime type, I'd suggest something like {{application/vnd.ms-spreadsheetml}} to be more
in keeping with our other related formats

> Add mime detection and lightweight parsers for Office 2003 Word and Excel formats
> ---------------------------------------------------------------------------------
>
>                 Key: TIKA-1958
>                 URL: https://issues.apache.org/jira/browse/TIKA-1958
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: 2010-cal-eu.xls, excel_msword_2003.tar.bz2
>
>
> Over on POI, a user asked if we supported 2003 xls (xml) files.  It would be neat if
we could add mime detection and a "good enough" parser to handle 2003 xls and doc files.
> This could be a great task for someone wanting to get started in contributing to Tika.
> references:
> https://mail-archives.apache.org/mod_mbox/poi-user/201604.mbox/%3Calpine.BSO.2.20.1604210825140.22929%40ref.nmedia.net%3E
> https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats
> https://msdn.microsoft.com/en-us/library/bb226687(v=office.11).aspx



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message