tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas DiPiazza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2224) Mime magic for OneNote formats
Date Tue, 15 Jan 2019 03:33:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742717#comment-16742717

Nicholas DiPiazza commented on TIKA-2224:

Where are we at with this? 

There is already a onenote parser implemented in c++ https://github.com/dropbox/onenote-parser

We could convert this code into java without much fuss. 

Has anyone done this already?

> Mime magic for OneNote formats
> ------------------------------
>                 Key: TIKA-2224
>                 URL: https://issues.apache.org/jira/browse/TIKA-2224
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.14
>            Reporter: Nick Burch
>            Priority: Major
>         Attachments: Sample1.one, note-ssn-test-mmmm.one
> As raised at http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers,
we don't have any magic for the OneNote formats. Several years ago we dug out the file format
specs (see http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but didn't
have volunteer energy to implement a parser. However, armed with those specs, we should be
able to come up with some mime magic for detection

This message was sent by Atlassian JIRA

View raw message