nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Nutch Wiki] Trivial Update of "GORA_HBase" by LewisJohnMcgibbney
Date Wed, 13 Jun 2012 10:57:43 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "GORA_HBase" page has been changed by LewisJohnMcgibbney:

- This document describes how to get Nutch to use HBase as a backend for GORA and is based
on the revision 993857 of the Nutch trunk
+ = Nutch 2.0 Tutorial =
+ {{}} {{}}
-  * Install and configure HBase 0.20.6. You can check it out from [[|here]]
('''N.B.''' It is important that you grab HBase version 0.20.6 at this is supported by Gora)
+ This document describes how to get Nutch 2.0 to use HBase as a storage backend for Gora.
+  * Install and configure HBase. You can get it [[|here]]
('''N.B.''' Gora 0.2 uses HBase 0.90.4, however the setup is know to work with more recent
versions of HBase.)
   * Specify the GORA backend in nutch-site.xml
@@ -12, +15 @@

   <description>Default class for storing data</description>
- Note: Currently HBaseStore is NOT YET THREAD-SAFE, so all processes should have single threaded
settings (i.e. set number of fetchers to 1). Work to make it thread-safe is in progress.
   * Compile Nutch -> ant runtime
   * Make sure HBase is started and working properly as per the quick start tutorial [[|here]]
@@ -24, +26 @@

    nutch readdb
- You should find more details in the logs on ''$NUTCH_HOME/runtime/local/logs/hadoop.log''
+ You should find more details in the logs on ''$NUTCH_HOME/runtime/local/logs/hadoop.log''.
+ For more details of the command line interface options, please see [[|here]],
or of course run ./bin/nutch which will print usage to std out.
+ Finally, for a more detailed Nutch (1.X) tutorial, please see [[|here]]
+ '''back to FrontPage'''

View raw message