tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <li...@holsman.net>
Subject Using Tika/Nutch to analyze a website
Date Mon, 16 Apr 2007 07:10:27 GMT
Hi.

I was planning on using nutch and UIMA to analyze to perform entity  
extraction, and noticed that you mention that Tika would be designed  
to do this.

i was wondering how things were going with Tika, as it doesn't seem  
like there is any code/design plans checked in (except for the  
proposal).

So I would like to spark the discussion.

i would like to:
- use nutch to fetch the pages (HTML) from the site
- UIMA to analyze them and extract interesting information.
- mysql, or possibly HBase to store versioned/historical output of  
this analysis, for possible further reporting on (stats, and page  
timelines)

is Tika going to be able to do this for me?

regards
Ian
--
Ian Holsman
Ian@Holsman.net




Mime
View raw message