tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manish (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-637) Need API to get list of embedded documents
Date Sun, 10 Apr 2011 22:49:05 GMT
Need API to get list of embedded documents

                 Key: TIKA-637
                 URL: https://issues.apache.org/jira/browse/TIKA-637
             Project: Tika
          Issue Type: New Feature
          Components: parser
    Affects Versions: 1.0
            Reporter: Manish

Apache tika works great to extract the content and the meta data of documents. 
but if it can have APIs where it can get you individual documents' input stream along with
its content and meta data, it would be great. 

For example, if it is extracting zip files, then if we can have the output in the form of
list of <text, metadata, inputstream> for each document, or provide an callback for
each <text, metadata, inputstream>, then it can be used for both text extraction and
also to extract individual documents from container files. 

I have already done it for zip and also PST. But if we can have some standard API, then it
would be great. 

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message