tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <m...@searcharea.co.uk>
Subject Re: Questions
Date Fri, 29 Jun 2007 22:05:45 GMT
>>Also, anyone have any recommendations for parsing various mail 
>>repositories like Outlook, Mac Mail (which I think is mbox), etc.?

"mstor" is a JavaMail implementation which should do a good job of handling 
mbox parsing for you. I've used it but looks like the license isn't Apache 
:( http://mstor.sourceforge.net/

I'm not up to speed with latest Tika developments for which I must 
apologise - I've been buried in other work since it's inception.


----- Original Message ----- 
From: "Grant Ingersoll" <gsingers@apache.org>
To: <tika-dev@incubator.apache.org>
Sent: Friday, June 29, 2007 9:57 PM
Subject: Questions

> Hey Gang,
> I was wondering if you had a todo list or something somewhere?  I  have 
> been loosely following the discussions here and see the general  outline 
> of what the goals are here: http://www.mail-archive.com/tika- 
> dev@incubator.apache.org/msg00024.html (Tika discussions in Amsterdam)
> Here's where I am at:  I am considering extracting the Nutch parsing 
> plugins for a project I am undertaking and wrapping them for my own 
> purposes, but knowing Tika is around, I would just as soon do this in  the 
> context of Tika, or at least try to help out that way and have it  become 
> a part of Tika.  I have not looked at Lius yet.  I guess I am  wondering 
> if you have some interfaces in mind that you want to hook  into, or is the 
> Nutch model (or Lius model) already going to serve as  the main model?  I 
> pretty much think the Nutch model has everything I  need at the moment, 
> but I don't want to carry around the whole set of  Nutch dependencies.  I 
> am not worried about content detection at this  point so much as 
> extraction.
> Is the plan to adopt a similar plugin approach as Nutch?
> So, I guess the question is what can I do at this point to help?   Should 
> I just go ahead with my needs and then give it back as a patch  and you 
> can decide what to do with it from there?  I  am in somewhat  of a hurry 
> to get the basics working in the next week or so.
> Also, anyone have any recommendations for parsing various mail 
> repositories like Outlook, Mac Mail (which I think is mbox), etc.?
> Cheers,
> Grant

View raw message