tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-433) Tika + Hadoop
Date Wed, 26 May 2010 13:32:56 GMT

    [ https://issues.apache.org/jira/browse/TIKA-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12871742#action_12871742
] 

Yonik Seeley commented on TIKA-433:
-----------------------------------

>From the peanut gallery, Lucene has gone down the contrib path in the past, and I wouldn't
recommend it.  There are tons of places to host projects these days, and it may make more
sense to be hosted as a separate project.


> Tika + Hadoop
> -------------
>
>                 Key: TIKA-433
>                 URL: https://issues.apache.org/jira/browse/TIKA-433
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>            Reporter: Grant Ingersoll
>            Priority: Minor
>
> Would be great to have a Tika contrib that took in an HDFS location with "rich" documents
on it and an output format (or output processor) and converted the docs to XHTML or Solr or
whatever.  Seems like it should be pretty straightforward to do on the Hadoop side of things.
 Only tricky part, I suppose, is the output format and how to make that pluggable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message