tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Questions
Date Fri, 29 Jun 2007 20:57:56 GMT
Hey Gang,

I was wondering if you had a todo list or something somewhere?  I  
have been loosely following the discussions here and see the general  
outline of what the goals are here: http://www.mail-archive.com/tika- 
dev@incubator.apache.org/msg00024.html (Tika discussions in Amsterdam)

Here's where I am at:  I am considering extracting the Nutch parsing  
plugins for a project I am undertaking and wrapping them for my own  
purposes, but knowing Tika is around, I would just as soon do this in  
the context of Tika, or at least try to help out that way and have it  
become a part of Tika.  I have not looked at Lius yet.  I guess I am  
wondering if you have some interfaces in mind that you want to hook  
into, or is the Nutch model (or Lius model) already going to serve as  
the main model?  I pretty much think the Nutch model has everything I  
need at the moment, but I don't want to carry around the whole set of  
Nutch dependencies.  I am not worried about content detection at this  
point so much as extraction.

Is the plan to adopt a similar plugin approach as Nutch?

So, I guess the question is what can I do at this point to help?   
Should I just go ahead with my needs and then give it back as a patch  
and you can decide what to do with it from there?  I  am in somewhat  
of a hurry to get the basics working in the next week or so.

Also, anyone have any recommendations for parsing various mail  
repositories like Outlook, Mac Mail (which I think is mbox), etc.?


View raw message