tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <li...@holsman.net>
Subject how do you see tika working
Date Tue, 29 May 2007 01:05:13 GMT

There hasn't been much activity on this list, so I thought I'd throw out 
my idea on how I see my little side project working.

I have to write a web content analysis tool which will take the HTML 
from a site and figure out various metrics from it (eg page size, # of 
JS calls etc).

What I was planning to do was use the nutch tool to fetch the URL data 
into segments, and then write a custom tool to extract the HTML out of 
the segment and run it through my code, similar to what the 'crawl' 
does, but dumping the metrics into a mysql DB.

Is this similar to what you guys had in mind with Tika?


View raw message