[ https://issues.apache.org/jira/browse/OODT-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754124#comment-13754124 ] Chris A. Mattmann commented on OODT-652: ---------------------------------------- +1, great discussion above Bfost and Reesh. +1 to keeping it as extractor *and* investigating integration into crawler. > New TikaCmdLineMetExtractor > --------------------------- > > Key: OODT-652 > URL: https://issues.apache.org/jira/browse/OODT-652 > Project: OODT > Issue Type: New Feature > Components: metadata container > Affects Versions: 0.6 > Reporter: Rishi Verma > Assignee: Rishi Verma > Fix For: 0.7 > > Attachments: extractor-config.properties, OODT-652.rverma.08-27-2013.patch.txt > > > Often times, we want to ingest a product and have some basic metadata automatically extracted from it without much effort. The Apache Tika project has great features supporting the detection of and extraction of metadata associated with a product to this effect. The purpose of this issue is to integrate these metadata extraction capabilities of Tika, so that OODT can easily leverage and make use of them. > At a minimum, this issue seeks to: > * Incorporate and use Tika's 'parse' method to extract metadata automatically > * Include the text content (if any) of a document inside a new metadata element dubbed 'content'. This will be useful for lucene and solr based free-text searches -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira