nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <>
Subject [ANNOUNCEMENT] Apache Nutch v2.0 Release
Date Sat, 07 Jul 2012 22:37:22 GMT
(apologies for cross posting...)

Good Evening Everyone,

The Apache Nutch PMC are very pleased to announce the release of
Apache Nutch v2.0. This release offers users an edition focused on
large scale crawling which builds on storage abstraction (via Apache
Gora™) for big data stores such as Apache Accumulo™, Apache Avro™,
Apache Cassandra™, Apache HBase™, HDFS™, an in memory data store and
various high profile SQL stores. After some two years of development
Nutch v2.0 also offers all of the mainstream Nutch functionality and
it builds on Apache Solr™ adding web-specifics, such as a crawler, a
link-graph database and parsing support handled by Apache Tika™ for
HTML and an array other document formats. Nutch v2.0 shadows the
latest stable mainstream release (v1.5.X) based on Apache Hadoop™ and
covers many use cases from small crawls on a single machine to large
scale deployments on Hadoop clusters. Please see the list of changes

made in this version for a full breakdown.. A full PMC release
statement can be found below

Nutch v2.0 is available in source (zip and tar.gz) from the
following download page:

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the
downloads using signatures found on the Apache site:

For more information on Apache Nutch, visit the project home page:

Thank you very much

Lewis John McGibbney (on behalf of the Apache Nutch community)


View raw message