nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: Suitable Nutch 2.0 Project Description
Date Wed, 13 Jun 2012 16:42:46 GMT
+1 to the description w/o experimental too (I agree with Ferdy).

You guys ROCK.

Cheers,
Chris

On Jun 13, 2012, at 5:29 AM, Lewis John Mcgibbney wrote:

> Hi,
> 
> Seeing as we have the ball rolling with the 2.0 RC. I thought I'd ask
> about a suitable project descriptor.
> 
> So far on trunk we have
> 
> ** Apache Nutch is an open source web-search software project.
> Stemming from Apache Lucene, it now builds on Apache Solr adding
> web-specifics, such as a crawler, a link-graph database and parsing
> support handled by Apache Tika for HTML and and array other document
> formats.
> 
> This is merely a pot shot, but I was thinking for Nutch 2.0, something like
> 
> ** Apache Nutch 2.X is an experimental branch of the Apache Nutch open
> source web-search software project. It builds on Apache Gora for data
> persistence and Apache Solr for indexing adding web-specifics, such as
> a crawler, a link-graph database and parsing support handled by Apache
> Tika for HTML and and array other document formats.
> 
> Although there are not many changes here I just wanted to run it by
> you folks...?
> 
> Thanks
> Lewis
> 
> -- 
> Lewis


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Mime
View raw message