nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: Library for extracting text content from binaries
Date Tue, 25 Jul 2006 06:54:03 GMT
Hi,

On 7/24/06, Chris Mattmann <chris.mattmann@jpl.nasa.gov> wrote:
> Thanks for your email. Jerome Charron and I proposed a project with a
> similar goal in mind that we wanted to dub "Tika". Tika would effectively be
> a Lucene sub-project, and would factor out some of the capabilities you
> mention below from Nutch, incl:

Sounds very useful! Jackrabbit could certainly use not only the
generalized parser functionality but also the other proposed features
like language identifiers, etc. Count me in.

> If you're interested in this idea, maybe it would be a good idea to contact Jerome
> and I off-list, and maybe we could get going on a proposal.

OK.

BR,

Jukka Zitting

-- 
Yukatan - http://yukatan.fi/ - info@yukatan.fi
Software craftsmanship, JCR consulting, and Java development

Mime
View raw message