tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2911) Add new parsers
Date Fri, 16 Aug 2019 12:40:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909023#comment-16909023
] 

Tim Allison commented on TIKA-2911:
-----------------------------------

Some other formats that come to mind:
* OOXML strict (probably best handled at the POI level, but we should be able to add that
to the streaming docx/pptx fairly easily)
* Newer Apple iWorks file formats
* OneNote
* Binary plist
* Serif PagePlus
* zipx
* ...and?


Then there's a category of a need for an unpacker for "large container files" such as:
* warc
* parquet
* ...and?

See some discussion: https://twitter.com/_tallison/status/1149035618321743878?s=20

> Add new parsers
> ---------------
>
>                 Key: TIKA-2911
>                 URL: https://issues.apache.org/jira/browse/TIKA-2911
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>
> Let's use this ticket as the parent for adding new parsers.  This will allow us to have
a single point of reference for requests/plans for new parsers.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message