nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Suitable Nutch 2.0 Project Description
Date Wed, 13 Jun 2012 12:29:29 GMT
Hi,

Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
about a suitable project descriptor.

So far on trunk we have

** Apache Nutch is an open source web-search software project.
Stemming from Apache Lucene, it now builds on Apache Solr adding
web-specifics, such as a crawler, a link-graph database and parsing
support handled by Apache Tika for HTML and and array other document
formats.

This is merely a pot shot, but I was thinking for Nutch 2.0, something like

** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
source web-search software project. It builds on Apache Gora for data
persistence and Apache Solr for indexing adding web-specifics, such as
a crawler, a link-graph database and parsing support handled by Apache
Tika for HTML and and array other document formats.

Although there are not many changes here I just wanted to run it by
you folks...?

Thanks
Lewis

-- 
Lewis

Mime
View raw message