nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewis john mcgibbney <>
Subject [ANNOUNCE] Apache Nutch 1.5 Released
Date Thu, 07 Jun 2012 16:52:32 GMT
(apologies for cross posting...)

Good Afternoon Everyone,

The 1.5 release of Nutch is now available. This release includes
several improvements including upgrades of several major components
including Tika 1.1 and Hadoop 1.0.0, improvements to LinkRank and
WebGraph elements as well as a number of new plugins covering
blacklisting, filtering and parsing to name a few. Please see the list
of changes

made in this version for a full breakdown of the 50 odd improvements
the release boasts. A full PMC release statement can be found below

Apache Nutch is an open source web-search software project. Stemming
from Apache Lucene, it now builds on Apache Solr adding web-specifics,
such as a crawler, a link-graph database and parsing support handled
by Apache Tika for HTML and and array other document formats. Nutch
can run on a single machine, but gains a lot of its strength from
running in a Hadoop cluster. The system can be enhanced (eg other
document formats can be parsed) using a highly flexible, easily
extensible and thoroughly maintained plugin infrastructure.

Nutch is available in source and binary form (zip and tar.gz) from the following
download page:

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads
using signatures found on the Apache site:

For more information on Apache Nutch, visit the project home page:

Thank you very much

Lewis John McGibbney (on behalf of the Apache Nutch community)

View raw message