nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <>
Subject Re: Library for extracting text content from binaries
Date Tue, 25 Jul 2006 06:54:03 GMT

On 7/24/06, Chris Mattmann <> wrote:
> Thanks for your email. Jerome Charron and I proposed a project with a
> similar goal in mind that we wanted to dub "Tika". Tika would effectively be
> a Lucene sub-project, and would factor out some of the capabilities you
> mention below from Nutch, incl:

Sounds very useful! Jackrabbit could certainly use not only the
generalized parser functionality but also the other proposed features
like language identifiers, etc. Count me in.

> If you're interested in this idea, maybe it would be a good idea to contact Jerome
> and I off-list, and maybe we could get going on a proposal.



Jukka Zitting

Yukatan - -
Software craftsmanship, JCR consulting, and Java development

View raw message