nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Nutch Wiki] Trivial Update of "Nutch2Tutorial" by LewisJohnMcgibbney
Date Wed, 13 Jun 2012 11:05:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "Nutch2Tutorial" page has been changed by LewisJohnMcgibbney:

New page:
= Nutch 2.0 Tutorial =
{{}} {{}}

This document describes how to get Nutch 2.0 to use HBase as a storage backend for Gora.

 * Grab a distribution of Nutch 2.X from [[|here]]
 * Install and configure HBase. You can get it [[|here]]
('''N.B.''' Gora 0.2 uses HBase 0.90.4, however the setup is know to work with more recent
versions of HBase.)
 * Specify the GORA backend in nutch-site.xml

 <description>Default class for storing data</description>

 * Ensure the HBase gora-hbase dependency is available in ivy/ivy.xml

    <!-- Uncomment this to use HBase as Gora backend. -->
    <dependency org="org.apache.gora" name="gora-hbase" rev="0.2" conf="*->default"

 * Compile Nutch -> ant runtime
 * Make sure HBase is started and working properly as per the quick start tutorial [[|here]]

You should then be able to use it. Try going to'' $NUTCH_HOME/runtime/local/bin'' and do :

  nutch inject /someseedDir
  nutch readdb

You should find more details in the logs on ''$NUTCH_HOME/runtime/local/logs/hadoop.log''.

For more details of the command line interface options, please see [[|here]],
or of course run ./bin/nutch which will print usage to std out.
Finally, for a more detailed Nutch (1.X) tutorial, please see [[|here]]

'''back to FrontPage'''

View raw message