tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-637) Need API to get list of embedded documents
Date Tue, 24 Jan 2012 14:54:43 GMT

     [ https://issues.apache.org/jira/browse/TIKA-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Nick Burch resolved TIKA-637.

    Resolution: Not A Problem

Closing as "Not A Problem", as this is handled by supplying a recursing parser on the ParseContext.
For an example of this, see how the -z option in the TikaCLI works
> Need API to get list of embedded documents
> ------------------------------------------
>                 Key: TIKA-637
>                 URL: https://issues.apache.org/jira/browse/TIKA-637
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 0.10
>            Reporter: Manish
> Apache tika works great to extract the content and the meta data of documents. 
> but if it can have APIs where it can get you individual documents' input stream along
with its content and meta data, it would be great. 
> For example, if it is extracting zip files, then if we can have the output in the form
of list of <text, metadata, inputstream> for each document, or provide an callback for
each <text, metadata, inputstream>, then it can be used for both text extraction and
also to extract individual documents from container files. 
> I have already done it for zip and also PST. But if we can have some standard API, then
it would be great. 

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message