tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2224) OneNote formats support - Mime Magic and Parser
Date Thu, 12 Dec 2019 16:45:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994865#comment-16994865

ASF GitHub Bot commented on TIKA-2224:

nddipiazza commented on pull request #303: TIKA-2224 OneNote parser support
URL: https://github.com/apache/tika/pull/303
   # OneNote parser
   The following adds `.one` file format parsing support. 
   `application/onenote; format=one`
   Supports embedded documents as well. 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> OneNote formats support - Mime Magic and Parser
> -----------------------------------------------
>                 Key: TIKA-2224
>                 URL: https://issues.apache.org/jira/browse/TIKA-2224
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.14
>            Reporter: Nick Burch
>            Priority: Major
>         Attachments: Sample1.json, Sample1.one, note-ssn-test-mmmm.one
> As raised at http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers,
we don't have any magic for the OneNote formats. Several years ago we dug out the file format
specs (see http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but didn't
have volunteer energy to implement a parser. However, armed with those specs, we should be
able to come up with some mime magic for detection

This message was sent by Atlassian Jira

View raw message