tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bertrand Delacretaz" <bdelacre...@apache.org>
Subject Re: how do you see tika working
Date Wed, 30 May 2007 07:15:14 GMT
On 5/29/07, Ian Holsman <lists@holsman.net> wrote:

> ...What I was planning to do was use the nutch tool to fetch the URL data
> into segments, and then write a custom tool to extract the HTML out of
> the segment and run it through my code, similar to what the 'crawl'
> does, but dumping the metrics into a mysql DB.
> Is this similar to what you guys had in mind with Tika?...

I think so, the "extract the HTML" part would be a standard Tika
plugin, and your metrics stuff would be a custom plugin.


View raw message