tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Questions
Date Fri, 29 Jun 2007 21:06:22 GMT
Also, please feel free to tell me I am getting to far ahead of  
things...  :-)

On Jun 29, 2007, at 4:57 PM, Grant Ingersoll wrote:

> Hey Gang,
> I was wondering if you had a todo list or something somewhere?  I  
> have been loosely following the discussions here and see the  
> general outline of what the goals are here: http://www.mail- 
> archive.com/tika-dev@incubator.apache.org/msg00024.html (Tika  
> discussions in Amsterdam)
> Here's where I am at:  I am considering extracting the Nutch  
> parsing plugins for a project I am undertaking and wrapping them  
> for my own purposes, but knowing Tika is around, I would just as  
> soon do this in the context of Tika, or at least try to help out  
> that way and have it become a part of Tika.  I have not looked at  
> Lius yet.  I guess I am wondering if you have some interfaces in  
> mind that you want to hook into, or is the Nutch model (or Lius  
> model) already going to serve as the main model?  I pretty much  
> think the Nutch model has everything I need at the moment, but I  
> don't want to carry around the whole set of Nutch dependencies.  I  
> am not worried about content detection at this point so much as  
> extraction.
> Is the plan to adopt a similar plugin approach as Nutch?
> So, I guess the question is what can I do at this point to help?   
> Should I just go ahead with my needs and then give it back as a  
> patch and you can decide what to do with it from there?  I  am in  
> somewhat of a hurry to get the basics working in the next week or so.
> Also, anyone have any recommendations for parsing various mail  
> repositories like Outlook, Mac Mail (which I think is mbox), etc.?
> Cheers,
> Grant

View raw message